[openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

Sylvain Bauza sbauza at redhat.com
Tue Jun 23 10:41:37 UTC 2015

Hi team,

Some discussion occurred over IRC about a bug which was publicly open 
related to TrustedFilter [1]
I want to take the opportunity for raising my concerns about that 
specific filter, why I dislike it and how I think we could improve the 
situation - and clarify everyone's thoughts)

The current situation is that way : Nova only checks if one host is 
compromised only when the scheduler is called, ie. only when 
booting/migrating/evacuating/unshelving an instance (well, not exactly 
all the evacuate/live-migrate cases, but let's not discuss about that 
now). When the request goes in the scheduler, all the hosts are checked 
against all the enabled filters and the TrustedFilter is making an 
external HTTP(S) call to the Attestation API service (not handled by 
Nova) for *each host* to see if the host is valid (not compromised) or not.

To be clear, that's the only in-tree scheduler filter which explicitly 
does an external call to a separate service that Nova is not managing. I 
can see at least 3 reasons for thinking about why it's bad :

#1 : that's a terrible bottleneck for performance, because we're 
IO-blocking N times given N hosts (we're even not multiplexing the HTTP 
#2 : all the filters are checking an internal Nova state for the host 
(called HostState) but that the TrustedFilter, which means that 
conceptually we defer the decision to a 3rd-party engine
#3 : that Attestation API services becomes a de facto dependency for 
Nova (since it's an in-tree filter) while it's not listed as a 
dependency and thus not gated.

All of these reasons could be acceptable if that would cover the exposed 
usecase given in [1] (ie. I want to make sure that if my host gets 
compromised, my instances will not be running on that host) but that 
just doesn't work, due to the situation I mentioned above.

So, given that, here are my thoughts :
a/ if a host gets compromised, we can just disable its service to 
prevent its election as a valid destination host. There is no need for a 
specialised filter.
b/ if a host is compromised, we can assume that the instances have to 
resurrect elsewhere, ie. we can call a nova evacuate
c/ checking if an host is compromised or not is not a Nova 
responsibility since it's already perfectly done by [2]

In other words, I'm considering that "security" usecase as something 
analog as the HA usecase [3] where we need a 3rd-party tool responsible 
for periodically checking the state of the hosts, and if compromised 
then call the Nova API for fencing the host and evacuating the 
compromised instances.

Given that, I'm proposing to deprecate TrustedFilter and explictly 
mention to drop it from in-tree in a later cycle 

Thoughts ?

[1] https://bugs.launchpad.net/nova/+bug/1456228
[2] https://github.com/OpenAttestation/OpenAttestation
[3] http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/

More information about the OpenStack-dev mailing list