[Openstack-operators] What to do when a compute node dies?
Jay Pipes
jaypipes at gmail.com
Tue Mar 31 14:49:47 UTC 2015
Chris, responded on the bug :)
Thanks!
-jay
On 03/31/2015 02:47 AM, Chris Friesen wrote:
> On 03/30/2015 09:53 PM, Jay Pipes wrote:
>> On 03/30/2015 07:30 PM, Chris Friesen wrote:
>>> On 03/30/2015 04:57 PM, Jay Pipes wrote:
>>>> On 03/30/2015 06:42 PM, Chris Friesen wrote:
>>>>> On 03/30/2015 02:47 PM, Jay Pipes wrote:
>>>>>> On 03/30/2015 10:42 AM, Chris Friesen wrote:
>>>>>>> On 03/29/2015 09:26 PM, Mike Dorman wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I’m curious about how people deal with failures of compute
>>>>>>>> nodes, as in total failure when the box is gone for good.
>>>>>>>> (Mainly care about KVM HV, but also interested in more
>>>>>>>> general cases as well.)
>>>>>>>>
>>>>>>>> The particular situation we’re looking at: how end users
>>>>>>>> could identify or be notified of VMs that no longer exist,
>>>>>>>> because their hypervisor is dead. As I understand it, Nova
>>>>>>>> will still believe VMs are running, and really has no way
>>>>>>>> to know anything has changed (other than the nova-compute
>>>>>>>> instance has dropped off.)
>>>>>>>>
>>>>>>>> I understand failure detection is a tricky thing. But it
>>>>>>>> seems like there must be something a little better than
>>>>>>>> this.
>>>>>>>
>>>>>>> This is a timely question...I was wondering if it might make
>>>>>>> sense to upstream one of the changes we've made locally.
>>>>>>>
>>>>>>> We have an external entity monitoring the health of compute
>>>>>>> nodes. When one of them goes down we automatically take
>>>>>>> action regarding the instances that had been running on it.
>>>>>>>
>>>>>>> Normally nova won't let you evacuate an instance until the
>>>>>>> compute node is detected as "down", but that takes 60 sec
>>>>>>> typically and our software knows the compute node is gone
>>>>>>> within a few seconds.
>>>>>>
>>>>>> Any external monitoring solution that detects the compute node
>>>>>> is "down" could issue a call to `nova evacuate $HOST`.
>>>>>>
>>>>>> The question I have for you is what does your software
>>>>>> consider as a "downed" node? Is it some heartbeat-type stuff in
>>>>>> network connectivity? A watchdog in KVM? Some proactive
>>>>>> monitoring of disk or memory faults? Some combination?
>>>>>> Something entirely different? :)
>>>>>
>>>>> Combination of the above. A local entity monitors "critical
>>>>> stuff" on the compute node, and heartbeats with a control node
>>>>> via one or more network links.
>>>>
>>>> OK.
>>>>
>>>>>>> The change we made was to patch nova to allow the health
>>>>>>> monitor to explicitly tell nova that the node is to be
>>>>>>> considered "down" (so that instances can be evacuated
>>>>>>> without delay).
>>>>>>
>>>>>> Why was it necessary to modify Nova for this? The external
>>>>>> monitoring script could easily do: `nova service-disable $HOST
>>>>>> nova-compute` and that immediately takes the compute node out
>>>>>> of service and enables evacuation.
>>>>>
>>>>> Disabling the service is not sufficient.
>>>>> compute.api.API.evacuate() throws an exception if
>>>>> servicegroup.api.API.service_is_up(service) is true.
>>>>
>>>> servicegroup.api.service_is_up() returns whether the service has
>>>> been disabled in the database (when using the DB servicegroup
>>>> driver). Which is what `nova service-disable $HOST nova-compute`
>>>> does.
>>>
>>> I must be missing something.
>>>
>>> It seems to me that servicegroup.drivers.db.DbDriver.is_up() returns
>>> whether the database row for the service has been updated for any
>>> reason within the last 60 seconds. (Assuming the default
>>> CONF.service_down_time.)
>>>
>>> Incidentally, I've proposed https://review.openstack.org/163060 to
>>> change that logic so that it returns whether the service has sent in
>>> a status report in the last 60 seconds. (As it stands currently if
>>> you disable/enable a "down" service it'll report that the service is
>>> "up" for the next 60 seconds.)
>>>
>>>> What servicegroup driver are you using?
>>>
>>> The DB driver.
>>
>> You've hit upon a bug. In no way should a disabled service be considered
>> "up". Apologies. I checked the code and indeed, there is no test for
>> whether the service record from the DB is disabled or not.
>
> I don't think it's a bug. It makes sense to have the administrative
> state (enabled/disabled) tracked separately from the operational state
> (up/down).
>
> If we administratively disable a compute node, that just means that the
> scheduler won't put new instances on it. It doesn't do anything to the
> instances already there. It's up to something outside of nova (the
> admin user, or some orchestration software) to move them elsewhere if
> appropriate.
>
> It actually makes sense to only allow evacuating from an operationally
> down compute node, because if the compute node is operationally up (even
> if administratively disabled) then you could do a migration (live or
> cold) which would be cleaner than an evacuate. The evacuate code
> assumes the instance isn't currently running, and that assumption is
> only true if the compute node is operationally down.
>
> The only issue I see with the current code is that it's possible to have
> a situation where some other code (the external monitor) knows quicker
> than the nova code that another compute node should be considered "down".
>
> Chris
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
More information about the OpenStack-operators
mailing list