[openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

Sylvain Bauza sbauza at redhat.com
Wed Sep 23 15:00:06 UTC 2015



Le 23/09/2015 15:31, Matt Riedemann a écrit :
>
>
> On 6/25/2015 3:59 AM, Sylvain Bauza wrote:
>>
>>
>> Le 24/06/2015 19:56, Joe Gordon a écrit :
>>>
>>>
>>> On Tue, Jun 23, 2015 at 3:41 AM, Sylvain Bauza <sbauza at redhat.com
>>> <mailto:sbauza at redhat.com>> wrote:
>>>
>>>     Hi team,
>>>
>>>     Some discussion occurred over IRC about a bug which was publicly
>>>     open related to TrustedFilter [1]
>>>     I want to take the opportunity for raising my concerns about that
>>>     specific filter, why I dislike it and how I think we could improve
>>>     the situation - and clarify everyone's thoughts)
>>>
>>>     The current situation is that way : Nova only checks if one host
>>>     is compromised only when the scheduler is called, ie. only when
>>>     booting/migrating/evacuating/unshelving an instance (well, not
>>>     exactly all the evacuate/live-migrate cases, but let's not discuss
>>>     about that now). When the request goes in the scheduler, all the
>>>     hosts are checked against all the enabled filters and the
>>>     TrustedFilter is making an external HTTP(S) call to the
>>>     Attestation API service (not handled by Nova) for *each host* to
>>>     see if the host is valid (not compromised) or not.
>>>
>>>     To be clear, that's the only in-tree scheduler filter which
>>>     explicitly does an external call to a separate service that Nova
>>>     is not managing. I can see at least 3 reasons for thinking about
>>>     why it's bad :
>>>
>>>     #1 : that's a terrible bottleneck for performance, because we're
>>>     IO-blocking N times given N hosts (we're even not multiplexing the
>>>     HTTP requests)
>>>     #2 : all the filters are checking an internal Nova state for the
>>>     host (called HostState) but that the TrustedFilter, which means
>>>     that conceptually we defer the decision to a 3rd-party engine
>>>     #3 : that Attestation API services becomes a de facto dependency
>>>     for Nova (since it's an in-tree filter) while it's not listed as a
>>>     dependency and thus not gated.
>>>
>>>
>>>     All of these reasons could be acceptable if that would cover the
>>>     exposed usecase given in [1] (ie. I want to make sure that if my
>>>     host gets compromised, my instances will not be running on that
>>>     host) but that just doesn't work, due to the situation I mentioned
>>>     above.
>>>
>>>     So, given that, here are my thoughts :
>>>     a/ if a host gets compromised, we can just disable its service to
>>>     prevent its election as a valid destination host. There is no need
>>>     for a specialised filter.
>>>     b/ if a host is compromised, we can assume that the instances have
>>>     to resurrect elsewhere, ie. we can call a nova evacuate
>>>     c/ checking if an host is compromised or not is not a Nova
>>>     responsibility since it's already perfectly done by [2]
>>>
>>>     In other words, I'm considering that "security" usecase as
>>>     something analog as the HA usecase [3] where we need a 3rd-party
>>>     tool responsible for periodically checking the state of the hosts,
>>>     and if compromised then call the Nova API for fencing the host and
>>>     evacuating the compromised instances.
>>>
>>>     Given that, I'm proposing to deprecate TrustedFilter and explictly
>>>     mention to drop it from in-tree in a later cycle
>>>     https://review.openstack.org/194592
>>>
>>>
>>> Given people are using this, it is a negligible maintenance burden.  I
>>> think deprecating with the intention of removing is not worth it.
>>>
>>> Although it would be very useful to further document the risks with
>>> this filter (live migration, possible performance issues etc.)
>>
>> Well, I can understand that customers could not be agreeing to remove
>> the filter because there is no clear alternative for them. That said, I
>> think saying that the filter is deprecated without saying when it would
>> be removed would help some contributors thinking about that and working
>> on a better solution, exactly like we did for EC2 API.
>>
>> To be clear, I want to freeze the filter by deprecating it and
>> explaining why it's wrong (by amending the devref section and giving a
>> LOG warning saying it's deprecated) and then leave the filter within
>> in-tree unless we are sure that there is a good solution out of Nova.
>>
>> -Sylvain
>>
>>
>>>
>>>
>>>     Thoughts ?
>>>     -Sylvain
>>>
>>>
>>>
>>>     [1] https://bugs.launchpad.net/nova/+bug/1456228
>>>     [2] https://github.com/OpenAttestation/OpenAttestation
>>>     [3]
>>> http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/ 
>>>
>>>
>>>
>>> __________________________________________________________________________
>>>     OpenStack Development Mailing List (not for usage questions)
>>>     Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>
>>>
>>> __________________________________________________________________________ 
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:OpenStack-dev-request at lists.openstack.org?subject:unsubscribe 
>>>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __________________________________________________________________________ 
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> I just reviewed the change https://review.openstack.org/#/c/194592/ 
> and agree with Joe.
>
> We can't justify deprecation and removal due to lack of CI testing - 
> there are many scheduler filters which aren't tested in the gate.  Or 
> if we can justify it that way, then we're setting a precedent.  So if 
> testing is the sore spot, then maybe we want Intel to look at setting 
> up 3rd party CI?  Maybe they could work it into their existing PCI CI?
>

Well, there is a difference between that filter and others since we 
could just provide some functional testing against the other filters 
just by adding Tempest tests while it would require far more than that 
for the TrustedFilter (ie. either pulling OAT as a dependency for Nova, 
or considering a 3rd-party CI).

For sure, I'd love to see some efforts for providing an integration with 
OAT if that filter stays in-tree.

> I also don't think we can justify the external dependency as grounds 
> for removal.  There are many possible configurations that require 
> external dependencies.  90% of cinder/neutron configurations probably 
> fall into this camp.
>
Fair enough, I just want to stress the point that some work has to be 
done before considering that this filter is having the same level of 
confidence than the others.

> From other parts of this thread it also sounds like there are 
> potentially alternatives to this filter but they aren't implemented, 
> or even written up in a spec.  Given there are users of this, I'd 
> think we'd want to see an agreed to alternative proposal to replace 
> this filter.
>

I totally support that. Like I said in my original email, this is not 
only a dependency problem, but rather a design problem. If we want to 
cover the given usecases, it requires more than just a filter, and IMHO 
all of this needs to be done outside Nova.


> I'm all for logging a warning that this filter is experimental 
> (meaning it's not tested in our CI system).  I don't think there is a 
> good reason to deprecate it right now though with an open-ended 
> removal date.
>

That's a very valid point, I'm fine with that. Thanks for the idea.

-Sylvain



More information about the OpenStack-dev mailing list