[openstack-dev] [nova] Nova compute will delete all your instances if you change its hostname
Daniel P. Berrange
berrange at redhat.com
Fri Feb 27 17:16:43 UTC 2015
On Fri, Feb 27, 2015 at 04:24:36PM +0000, Matthew Booth wrote:
> Gary Kotton originally posted this bug against the VMware driver:
>
> https://bugs.launchpad.net/nova/+bug/1419785
>
> I posted a proposed patch to fix this here:
>
> https://review.openstack.org/#/c/158269/1
>
> However, Dan Smith pointed out that the bug can actually be triggered
> against any driver in a manner not addressed by the above patch alone. I
> have confirmed this against a libvirt setup as follows:
>
> 1. Create some instances
> 2. Shutdown n-cpu
> 3. Change hostname
> 4. Restart n-cpu
>
> Nova compute will delete all instances in libvirt, but continue to
> report them as ACTIVE and Running.
>
> There are 2 parts to this issue:
>
> 1. _destroy_evacuated_instances() should do a better job of sanity
> checking before performing such a drastic action.
>
> 2. The underlying issue is the definition and use of instance.host,
> instance.node, compute_node.host and compute_node.hypervisor_hostname.
>
> (1) is belt and braces. It's very important, but I want to focus on (2)
> here. Instantly you'll notice some inconsistent naming here, so to clarify:
>
> * instance.host == compute_node.host == Nova compute's 'host' value.
> * instance.node == compute_node.hypervisor_hostname == an identifier
> which represents a hypervisor.
>
> Architecturally, I'd argue that these mean:
>
> * Host: A Nova communication endpoint for a hypervisor.
> * Hypervisor: The physical location of a VM.
>
> Note that in the above case the libvirt driver changed the hypervisor
> identifier despite the fact that the hypervisor had not changed, only
> its communication endpoint. I propose the following:
>
> * ComputeNode describes 1 hypervisor.
> * ComputeNode maps 1 hypervisor to 1 compute host.
> * A ComputeNode is identified by a hypervisor_id.
> * hypervisor_id represents the physical location of running VMs,
> independent of a compute host.
>
> We've renamed compute_node.hypervisor_hostname to
> compute_node.hypervisor_id. This resolves some confusion, because it
> asserts that the identity of the hypervisor is tied to the data
> describing VMs, not the host which is running it. In fact, for the
> VMware and Ironic drivers it has never been a hostname.
>
> VMware[1] and Ironic don't require any changes here. Other drivers will
> need to be modified so that get_available_nodes() returns a persistent
> value rather than just the hostname. A reasonable default implementation
> of this would be to write a uuid to a file which lives with VM data and
> return its contents. If the hypervisor has a native concept of a
> globally unique identifier, that should be used instead.
I don't think there's any need to write state in that way. Every hypervisor
I've come across has a way to report a gloally unique identifier, which is
typically the host UUID coming from the BIOS, or some equivalent. For libvirt
you can get the host UUID from the capabilities XML, so we could pretty
easily handle that.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the OpenStack-dev
mailing list