[Openstack-operators] What to do when a compute node dies?

Chris Friesen chris.friesen at windriver.com
Tue Mar 31 06:47:23 UTC 2015


On 03/30/2015 09:53 PM, Jay Pipes wrote:
> On 03/30/2015 07:30 PM, Chris Friesen wrote:
>> On 03/30/2015 04:57 PM, Jay Pipes wrote:
>>> On 03/30/2015 06:42 PM, Chris Friesen wrote:
>>>> On 03/30/2015 02:47 PM, Jay Pipes wrote:
>>>>> On 03/30/2015 10:42 AM, Chris Friesen wrote:
>>>>>> On 03/29/2015 09:26 PM, Mike Dorman wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I’m curious about how people deal with failures of compute
>>>>>>>  nodes, as in total failure when the box is gone for good.
>>>>>>>  (Mainly care about KVM HV, but also interested in more
>>>>>>> general cases as well.)
>>>>>>>
>>>>>>> The particular situation we’re looking at: how end users
>>>>>>> could identify or be notified of VMs that no longer exist,
>>>>>>> because their hypervisor is dead.  As I understand it, Nova
>>>>>>> will still believe VMs are running, and really has no way
>>>>>>> to know anything has changed (other than the nova-compute
>>>>>>> instance has dropped off.)
>>>>>>>
>>>>>>> I understand failure detection is a tricky thing.  But it
>>>>>>> seems like there must be something a little better than
>>>>>>> this.
>>>>>>
>>>>>> This is a timely question...I was wondering if it might make
>>>>>>  sense to upstream one of the changes we've made locally.
>>>>>>
>>>>>> We have an external entity monitoring the health of compute
>>>>>> nodes. When one of them goes down we automatically take
>>>>>> action regarding the instances that had been running on it.
>>>>>>
>>>>>> Normally nova won't let you evacuate an instance until the
>>>>>> compute node is detected as "down", but that takes 60 sec
>>>>>> typically and our software knows the compute node is gone
>>>>>> within a few seconds.
>>>>>
>>>>> Any external monitoring solution that detects the compute node
>>>>> is "down" could issue a call to `nova evacuate $HOST`.
>>>>>
>>>>> The question I have for you is what does your software
>>>>> consider as a "downed" node? Is it some heartbeat-type stuff in
>>>>> network connectivity? A watchdog in KVM? Some proactive
>>>>> monitoring of disk or memory faults? Some combination?
>>>>> Something entirely different? :)
>>>>
>>>> Combination of the above.  A local entity monitors "critical
>>>> stuff" on the compute node, and heartbeats with a control node
>>>> via one or more network links.
>>>
>>> OK.
>>>
>>>>>> The change we made was to patch nova to allow the health
>>>>>> monitor to explicitly tell nova that the node is to be
>>>>>> considered "down" (so that instances can be evacuated
>>>>>> without delay).
>>>>>
>>>>> Why was it necessary to modify Nova for this? The external
>>>>> monitoring script could easily do: `nova service-disable $HOST
>>>>>  nova-compute` and that immediately takes the compute node out
>>>>>  of service and enables evacuation.
>>>>
>>>> Disabling the service is not sufficient.
>>>> compute.api.API.evacuate() throws an exception if
>>>> servicegroup.api.API.service_is_up(service) is true.
>>>
>>> servicegroup.api.service_is_up() returns whether the service has
>>> been disabled in the database (when using the DB servicegroup
>>> driver). Which is what `nova service-disable $HOST nova-compute`
>>> does.
>>
>> I must be missing something.
>>
>> It seems to me that servicegroup.drivers.db.DbDriver.is_up() returns
>> whether the database row for the service has been updated for any
>> reason within the last 60 seconds. (Assuming the default
>> CONF.service_down_time.)
>>
>> Incidentally, I've proposed https://review.openstack.org/163060 to
>> change that logic so that it returns whether the service has sent in
>> a status report in the last 60 seconds.  (As it stands currently if
>> you disable/enable a "down" service it'll report that the service is
>> "up" for the next 60 seconds.)
>>
>>> What servicegroup driver are you using?
>>
>> The DB driver.
>
> You've hit upon a bug. In no way should a disabled service be considered
> "up". Apologies. I checked the code and indeed, there is no test for
> whether the service record from the DB is disabled or not.

I don't think it's a bug.  It makes sense to have the administrative state 
(enabled/disabled) tracked separately from the operational state (up/down).

If we administratively disable a compute node, that just means that the 
scheduler won't put new instances on it.  It doesn't do anything to the 
instances already there.  It's up to something outside of nova (the admin user, 
or some orchestration software) to move them elsewhere if appropriate.

It actually makes sense to only allow evacuating from an operationally down 
compute node, because if the compute node is operationally up (even if 
administratively disabled) then you could do a migration (live or cold) which 
would be cleaner than an evacuate.  The evacuate code assumes the instance isn't 
currently running, and that assumption is only true if the compute node is 
operationally down.

The only issue I see with the current code is that it's possible to have a 
situation where some other code (the external monitor) knows quicker than the 
nova code that another compute node should be considered "down".

Chris



More information about the OpenStack-operators mailing list