[openstack-dev] [nova][vmware][ironic] Configuring active/passive HA Nova compute

Matthew Booth mbooth at redhat.com
Mon Feb 23 13:18:19 UTC 2015


On 23/02/15 12:13, Gary Kotton wrote:
> 
> 
> On 2/23/15, 2:05 PM, "Matthew Booth" <mbooth at redhat.com> wrote:
> 
>> On 20/02/15 11:48, Matthew Booth wrote:
>>> Gary Kotton came across a doozy of a bug recently:
>>>
>>> https://bugs.launchpad.net/nova/+bug/1419785
>>>
>>> In short, when you start a Nova compute, it will query the driver for
>>> instances and compare that against the expected host of the the instance
>>> according to the DB. If the driver is reporting an instance the DB
>>> thinks is on a different host, it assumes the instance was evacuated
>>> while Nova compute was down, and deletes it on the hypervisor. However,
>>> Gary found that you trigger this when starting up a backup HA node which
>>> has a different `host` config setting. i.e. You fail over, and the first
>>> thing it does is delete all your instances.
>>>
>>> Gary and I both agree on a couple of things:
>>>
>>> 1. Deleting all your instances is bad
>>> 2. HA nova compute is highly desirable for some drivers
>>>
>>> We disagree on the approach to fixing it, though. Gary posted this:
>>>
>>> https://review.openstack.org/#/c/154029/
>>>
>>> I've already outlined my objections to this approach elsewhere, but to
>>> summarise I think this fixes 1 symptom of a design problem, and leaves
>>> the rest untouched. If the value of nova compute's `host` changes, then
>>> the assumption that instances associated with that compute can be
>>> identified by the value of instance.host becomes invalid. This
>>> assumption is pervasive, so it breaks a lot of stuff. The worst one is
>>> _destroy_evacuated_instances(), which Gary found, but if you scan
>>> nova/compute/manager for the string 'self.host' you'll find lots of
>>> them. For example, all the periodic tasks are broken, including image
>>> cache management, and the state of ResourceTracker will be unusual.
>>> Worse, whenever a new instance is created it will have a different value
>>> of instance.host, so instances running on a single hypervisor will
>>> become partitioned based on which nova compute was used to create them.
>>>
>>> In short, the system may appear to function superficially, but it's
>>> unsupportable.
>>>
>>> I had an alternative idea. The current assumption is that the `host`
>>> managing a single hypervisor never changes. If we break that assumption,
>>> we break Nova, so we could assert it at startup and refuse to start if
>>> it's violated. I posted this VMware-specific POC:
>>>
>>> https://review.openstack.org/#/c/154907/
>>>
>>> However, I think I've had a better idea. Nova creates ComputeNode
>>> objects for its current configuration at startup which, amongst other
>>> things, are a map of host:hypervisor_hostname. We could assert when
>>> creating a ComputeNode that hypervisor_hostname is not already
>>> associated with a different host, and refuse to start if it is. We would
>>> give an appropriate error message explaining that this is a
>>> misconfiguration. This would prevent the user from hitting any of the
>>> associated problems, including the deletion of all their instances.
>>
>> I have posted a patch implementing the above for review here:
>>
>> https://review.openstack.org/#/c/158269/
> 
> I have to look at what you have posted. I think that this topic is
> something that we should speak about at the summit and this should fall
> under some BP and well defined spec. I really would not like to see
> existing installations being broken if and when this patch lands. It may
> also affect Ironic as it works on the same model.

This patch will only affect installations configured with multiple
compute hosts for a single hypervisor. These are already broken, so this
patch will at least let them know if they haven't already noticed.

It won't affect Ironic, because they configure all compute hosts to have
the same 'host' value. An Ironic user would only notice this patch if
they accidentally misconfigured it, which is the intended behaviour.

Incidentally, I also support more focus on the design here. Until we
come up with a better design, though, we need to do our best to prevent
non-trivial corruption from a trivial misconfiguration. I think we need
to merge this, or something like it, now and still have a summit discussion.

Matt
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490



More information about the OpenStack-dev mailing list