[openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)
Matt Riedemann
mriedem at linux.vnet.ibm.com
Wed Sep 23 16:11:04 UTC 2015
On 9/23/2015 10:00 AM, Sylvain Bauza wrote:
>
>
> Le 23/09/2015 15:31, Matt Riedemann a écrit :
>>
>>
>> On 6/25/2015 3:59 AM, Sylvain Bauza wrote:
>>>
>>>
>>> Le 24/06/2015 19:56, Joe Gordon a écrit :
>>>>
>>>>
>>>> On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza <sbauza at redhat.com
>>>> <mailto:sbauza at redhat.com>> wrote:
>>>>
>>>> Hi team,
>>>>
>>>> Some discussion occurred over IRC about a bug which was publicly
>>>> open related to TrustedFilter [1]
>>>> I want to take the opportunity for raising my concerns about that
>>>> specific filter, why I dislike it and how I think we could improve
>>>> the situation - and clarify everyone's thoughts)
>>>>
>>>> The current situation is that way : Nova only checks if one host
>>>> is compromised only when the scheduler is called, ie. only when
>>>> booting/migrating/evacuating/unshelving an instance (well, not
>>>> exactly all the evacuate/live-migrate cases, but let's not discuss
>>>> about that now). When the request goes in the scheduler, all the
>>>> hosts are checked against all the enabled filters and the
>>>> TrustedFilter is making an external HTTP(S) call to the
>>>> Attestation API service (not handled by Nova) for *each host* to
>>>> see if the host is valid (not compromised) or not.
>>>>
>>>> To be clear, that's the only in-tree scheduler filter which
>>>> explicitly does an external call to a separate service that Nova
>>>> is not managing. I can see at least 3 reasons for thinking about
>>>> why it's bad :
>>>>
>>>> #1 : that's a terrible bottleneck for performance, because we're
>>>> IO-blocking N times given N hosts (we're even not multiplexing the
>>>> HTTP requests)
>>>> #2 : all the filters are checking an internal Nova state for the
>>>> host (called HostState) but that the TrustedFilter, which means
>>>> that conceptually we defer the decision to a 3rd-party engine
>>>> #3 : that Attestation API services becomes a de facto dependency
>>>> for Nova (since it's an in-tree filter) while it's not listed as a
>>>> dependency and thus not gated.
>>>>
>>>>
>>>> All of these reasons could be acceptable if that would cover the
>>>> exposed usecase given in [1] (ie. I want to make sure that if my
>>>> host gets compromised, my instances will not be running on that
>>>> host) but that just doesn't work, due to the situation I mentioned
>>>> above.
>>>>
>>>> So, given that, here are my thoughts :
>>>> a/ if a host gets compromised, we can just disable its service to
>>>> prevent its election as a valid destination host. There is no need
>>>> for a specialised filter.
>>>> b/ if a host is compromised, we can assume that the instances have
>>>> to resurrect elsewhere, ie. we can call a nova evacuate
>>>> c/ checking if an host is compromised or not is not a Nova
>>>> responsibility since it's already perfectly done by [2]
>>>>
>>>> In other words, I'm considering that "security" usecase as
>>>> something analog as the HA usecase [3] where we need a 3rd-party
>>>> tool responsible for periodically checking the state of the hosts,
>>>> and if compromised then call the Nova API for fencing the host and
>>>> evacuating the compromised instances.
>>>>
>>>> Given that, I'm proposing to deprecate TrustedFilter and explictly
>>>> mention to drop it from in-tree in a later cycle
>>>> https://review.openstack.org/194592
>>>>
>>>>
>>>> Given people are using this, it is a negligible maintenance burden. I
>>>> think deprecating with the intention of removing is not worth it.
>>>>
>>>> Although it would be very useful to further document the risks with
>>>> this filter (live migration, possible performance issues etc.)
>>>
>>> Well, I can understand that customers could not be agreeing to remove
>>> the filter because there is no clear alternative for them. That said, I
>>> think saying that the filter is deprecated without saying when it would
>>> be removed would help some contributors thinking about that and working
>>> on a better solution, exactly like we did for EC2 API.
>>>
>>> To be clear, I want to freeze the filter by deprecating it and
>>> explaining why it's wrong (by amending the devref section and giving a
>>> LOG warning saying it's deprecated) and then leave the filter within
>>> in-tree unless we are sure that there is a good solution out of Nova.
>>>
>>> -Sylvain
>>>
>>>
>>>>
>>>>
>>>> Thoughts ?
>>>> -Sylvain
>>>>
>>>>
>>>>
>>>> [1] https://bugs.launchpad.net/nova/+bug/1456228
>>>> [2] https://github.com/OpenAttestation/OpenAttestation
>>>> [3]
>>>> http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
>>>>
>>>>
>>>>
>>>> __________________________________________________________________________
>>>>
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>>
>>>>
>>>> __________________________________________________________________________
>>>>
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>> __________________________________________________________________________
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> I just reviewed the change https://review.openstack.org/#/c/194592/
>> and agree with Joe.
>>
>> We can't justify deprecation and removal due to lack of CI testing -
>> there are many scheduler filters which aren't tested in the gate. Or
>> if we can justify it that way, then we're setting a precedent. So if
>> testing is the sore spot, then maybe we want Intel to look at setting
>> up 3rd party CI? Maybe they could work it into their existing PCI CI?
>>
>
> Well, there is a difference between that filter and others since we
> could just provide some functional testing against the other filters
> just by adding Tempest tests while it would require far more than that
> for the TrustedFilter (ie. either pulling OAT as a dependency for Nova,
> or considering a 3rd-party CI).
Tempest tests the API, so adding tests to Tempest would be tricky,
unless the test added is a scenario that would only be expected to
behave a certain why if a given filter is available in the configuration.
The scheduler_default_filters is really all we have in the gate runs today.
>
> For sure, I'd love to see some efforts for providing an integration with
> OAT if that filter stays in-tree.
>
>> I also don't think we can justify the external dependency as grounds
>> for removal. There are many possible configurations that require
>> external dependencies. 90% of cinder/neutron configurations probably
>> fall into this camp.
>>
> Fair enough, I just want to stress the point that some work has to be
> done before considering that this filter is having the same level of
> confidence than the others.
>
>> From other parts of this thread it also sounds like there are
>> potentially alternatives to this filter but they aren't implemented,
>> or even written up in a spec. Given there are users of this, I'd
>> think we'd want to see an agreed to alternative proposal to replace
>> this filter.
>>
>
> I totally support that. Like I said in my original email, this is not
> only a dependency problem, but rather a design problem. If we want to
> cover the given usecases, it requires more than just a filter, and IMHO
> all of this needs to be done outside Nova.
>
>
>> I'm all for logging a warning that this filter is experimental
>> (meaning it's not tested in our CI system). I don't think there is a
>> good reason to deprecate it right now though with an open-ended
>> removal date.
>>
>
> That's a very valid point, I'm fine with that. Thanks for the idea.
>
> -Sylvain
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
Thanks,
Matt Riedemann
More information about the OpenStack-dev
mailing list