<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Feb 20, 2015 at 3:48 AM, Matthew Booth <span dir="ltr"><<a href="mailto:mbooth@redhat.com" target="_blank">mbooth@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Gary Kotton came across a doozy of a bug recently:<br>

<br>

<a href="https://bugs.launchpad.net/nova/+bug/1419785" target="_blank">https://bugs.launchpad.net/nova/+bug/1419785</a><br>

<br>

In short, when you start a Nova compute, it will query the driver for<br>

instances and compare that against the expected host of the the instance<br>

according to the DB. If the driver is reporting an instance the DB<br>

thinks is on a different host, it assumes the instance was evacuated<br>

while Nova compute was down, and deletes it on the hypervisor. However,<br>

Gary found that you trigger this when starting up a backup HA node which<br>

has a different `host` config setting. i.e. You fail over, and the first<br>

thing it does is delete all your instances.<br>

<br>

Gary and I both agree on a couple of things:<br>

<br>

1. Deleting all your instances is bad<br>

2. HA nova compute is highly desirable for some drivers<br></blockquote><div><br></div><div>There is a deeper issue here, that we are trying to work around.  Nova was never designed to have entire systems running behind a nova-compute. It was designed to have one nova-compute per 'physical box that runs instances'</div><div><br></div><div>There have been many discussions in the past on how to fix this issue (by adding a new point in nova where clustered systems can plug in), but if I remember correctly the gotcha was no one was willing to step up to do it.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

We disagree on the approach to fixing it, though. Gary posted this:<br>

<br>

<a href="https://review.openstack.org/#/c/154029/" target="_blank">https://review.openstack.org/#/c/154029/</a><br>

<br>

I've already outlined my objections to this approach elsewhere, but to<br>

summarise I think this fixes 1 symptom of a design problem, and leaves<br>

the rest untouched. If the value of nova compute's `host` changes, then<br>

the assumption that instances associated with that compute can be<br>

identified by the value of instance.host becomes invalid. This<br>

assumption is pervasive, so it breaks a lot of stuff. The worst one is<br>

_destroy_evacuated_instances(), which Gary found, but if you scan<br>

nova/compute/manager for the string 'self.host' you'll find lots of<br>

them. For example, all the periodic tasks are broken, including image<br>

cache management, and the state of ResourceTracker will be unusual.<br>

Worse, whenever a new instance is created it will have a different value<br>

of instance.host, so instances running on a single hypervisor will<br>

become partitioned based on which nova compute was used to create them.<br>

<br>

In short, the system may appear to function superficially, but it's<br>

unsupportable.<br>

<br>

I had an alternative idea. The current assumption is that the `host`<br>

managing a single hypervisor never changes. If we break that assumption,<br>

we break Nova, so we could assert it at startup and refuse to start if<br>

it's violated. I posted this VMware-specific POC:<br>

<br>

<a href="https://review.openstack.org/#/c/154907/" target="_blank">https://review.openstack.org/#/c/154907/</a><br>

<br>

However, I think I've had a better idea. Nova creates ComputeNode<br>

objects for its current configuration at startup which, amongst other<br>

things, are a map of host:hypervisor_hostname. We could assert when<br>

creating a ComputeNode that hypervisor_hostname is not already<br>

associated with a different host, and refuse to start if it is. We would<br>

give an appropriate error message explaining that this is a<br>

misconfiguration. This would prevent the user from hitting any of the<br>

associated problems, including the deletion of all their instances.<br>

<br>

We can still do active/passive HA!<br>

<br>

If we configure both nodes in the active/passive cluster identically,<br>

including with the same value of `host`, I don't see why this shouldn't<br>

work today. I don't even think the configuration is onerous. All we<br>

would be doing is preventing the user from accidentally running a<br>

misconfigured HA which leads to inconsistent state, and will eventually<br>

require manual cleanup.<br>

<br>

We would still have to be careful that we don't bring up both nova<br>

computes simultaneously. The VMware driver, at least, has hardcoded<br>

assumptions that it is the only writer in certain circumstances. That<br>

problem would have to be handled separately, perhaps at the messaging layer.<br>

<br>

Matt<br>

<span class="HOEnZb"><font color="#888888">--<br>

Matthew Booth<br>

Red Hat Engineering, Virtualisation Team<br>

<br>

Phone: <a href="tel:%2B442070094448" value="+442070094448">+442070094448</a> (UK)<br>

GPG ID:  D33C3490<br>

GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</font></span></blockquote></div><br></div></div>