[openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)

Juvonen, Tomi (Nokia - FI/Espoo) tomi.juvonen at nokia.com
Thu Jun 25 17:10:16 UTC 2015


>-----Original Message-----
>From: ext John Garbutt [mailto:john at johngarbutt.com] 
>Sent: Thursday, June 25, 2015 4:39 PM
>To: OpenStack Development Mailing List (not for usage questions)
>Subject: Re: [openstack-dev] [nova] How to properly detect and fence a compromised host (and why I dislike TrustedFilter)
>
>On 25 June 2015 at 14:09, Dulko, Michal <michal.dulko at intel.com> wrote:
>>> -----Original Message-----
>>> From: John Garbutt [mailto:john at johngarbutt.com]
>>> Sent: Thursday, June 25, 2015 2:22 PM
>>> To: OpenStack Development Mailing List (not for usage questions)
>>> Subject: Re: [openstack-dev] [nova] How to properly detect and fence a
>>> compromised host (and why I dislike TrustedFilter)
>>>
>>> On 24 June 2015 at 09:35, Dulko, Michal <michal.dulko at intel.com> wrote:
>>> >> -----Original Message-----
>>> >> From: Sylvain Bauza [mailto:sbauza at redhat.com]
>>> >> Sent: Wednesday, June 24, 2015 9:39 AM
>>> >> To: OpenStack Development Mailing List (not for usage questions)
>>> >> Subject: Re: [openstack-dev] [nova] How to properly detect and fence
>>> >> a compromised host (and why I dislike TrustedFilter)
>>
>> (snip)
>>
>>> >> > So I would suggest using the 3rd-party tools as enhancing way to
>>> >> supplement our TCP/trustedfilter feature. And the 3rd party tools can
>>> >> also call attestation API for host attestation.
>>> >>
>>> >> I don't see much benefits of keeping such filter for the reasons I
>>> >> mentioned below. Again, if you want to fence one host, you can just
>>> >> disable its service, that's enough.
>>> >
>>> > This won't address the case in which you have heterogenic environment
>>> and you want only some important VMs to run on trusted hosts (and for the
>>> rest of the VMs you don't care).
>>>
>>> This is an interesting one to dig into.
>>>
>>> I had assumed in this case you put all the VMs that want the attestation
>>> check in a subset of nodes that are setup to use that set.
>>> You can do that using host aggregates and our existing filters.
>>>
>>> An external system could then just disable hosts within that subset of hosts
>>> that have the attestation check working.
>>>
>>> Does that work for your use case?
>>
>> It should be fine for this case.  But then - why not go further and remove SG API? Let's leave monitoring of services to Pacemaker and NagiOS and they disable them if they consider that service is down.
>
>Honestly, I find that idea very attractive.
>
>The "mark down API" is basically going down that route.
>http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/mark-host-down.html

> My point is that following this logic we may use external services to replace any filter that has such simple logic. Is this the right direction?

>If its an external system, and you can integrate more efficiently by
>disabling hosts, then yes thats awesome.

>Thats not always going to be the correct direction, but we need to
>look at if something can be done externally first. Nova is too big
>already, we are actively trying to not expand its scope.

So I worked this "mark down API" spec and now still working on the server states ("VM states") as they stay in incorrect state if host suddenly goes down. Would appreciate comment on https://review.openstack.org/#/c/192246 to have right tract to do it. Maybe directly change the VM states when "mark down API" called and not like now proposed. And yes, there are use cases where one do not evacuate the VMs, so it will be valuable to see those states correct.

Related, I am working in OPNFV to bring Doctor project as external system that could use under the hood different existing opensource projects like Pacemaker or Nagios to detect any kind of host fault fast and use this "mark down API" to tell this to Nova. This Doctor will be opensource and for anybody to use. It also has now Ceilometer BP approved to enhance direct alarming for user without polling. So let's see what will happen when this work is completed. Could even be a component inside openstack someday when reach that kind of maturity (detecting faults, fence and doing automatic correlation based on VM specific configuration and faults specific configuration if wanted so..).

Br,
Tomi

>Thanks,
>John





More information about the OpenStack-dev mailing list