[openstack-dev] [nova] Nova compute will delete all your instances if you change its hostname

Dan Smith dms at danplanet.com
Fri Feb 27 16:57:53 UTC 2015


Did we really need another top-level thread for this?

> 1. _destroy_evacuated_instances() should do a better job of sanity 
> checking before performing such a drastic action.

I agree, and no amount of hostname checking will actually address this
problem. If we don't have a record of an evacuate having been scheduled
for the host, then there is no legitmate reason to delete data, IMHO.

> 2. The underlying issue is the definition and use of instance.host, 
> instance.node, compute_node.host and compute_node.hypervisor_hostname.

I disagree. I think the underlying issues are:

1. Evacuate assumes too much
2. Nova has a model of one compute per hypervisor and we have some
   drivers that make it easy to violate that in dangerous ways, and
   which don't do their due diligence to avoid catastrophe.

> Note that in the above case the libvirt driver changed the hypervisor
> identifier despite the fact that the hypervisor had not changed, only
> its communication endpoint.

I'd argue they're one and the same, and that's just fine. We just
shouldn't erroneously delete things when that happens unexpectedly.

> VMware[1] and Ironic don't require any changes here.

But they're broken! If they are managing things from another point that
can be duplicated and they don't provide assurances that it's not being
done twice, then that's a problem. I (and others) have argued that
nova's model is one compute per hypervisor. I don't think it should be
up to nova to ensure that, I think it should be up to the driver.

Nova needs to stop deleting things based on cheap guessing. However, if
two hypervisor drivers claim they're different and that they have
deleted running instances (which is what is going on here), I have
little sympathy.

> Other drivers will need to be modified so that get_available_nodes() 
> returns a persistent value rather than just the hostname.

-1 on making the non-problematic drivers (potentially) maintain state
and leaving the problematic ones unchanged.

> A reasonable default implementation of this would be to write a uuid
> to a file which lives with VM data and return its contents. If the 
> hypervisor has a native concept of a globally unique identifier,
> that should be used instead.

Those drivers shouldn't have to maintain state. And they already have a
unique identifier: the hostname.

--Dan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150227/696ff689/attachment.pgp>


More information about the OpenStack-dev mailing list