[openstack-dev] [nova] Interesting bug when unshelving an instance in an AZ and the AZ is gone

Dean Troyer dtroyer at gmail.com
Mon Oct 16 16:00:46 UTC 2017


[not having a dog in this hunt, this is what I would expect as a cloud consumer]

On Mon, Oct 16, 2017 at 10:22 AM, Matt Riedemann <mriedemos at gmail.com> wrote:
> - The user creates an instance in a non-default AZ.
> - They shelve offload the instance.
> - The admin deletes the AZ that the instance was using, for whatever reason.
> - The user unshelves the instance which goes back through scheduling and
> fails with NoValidHost because the AZ on the original request spec no longer
> exists.

> 1. How reasonable is it for a user to expect in a stable production
> environment that AZs are going to be deleted from under them? We actually
> have a spec related to this but with AZ renames:

Change happens...

> 2. Should we null out the instance.availability_zone when it's shelved
> offloaded like we do for the instance.host and instance.node attributes?
> Similarly, we would not take into account the RequestSpec.availability_zone
> when scheduling during unshelve. I tend to prefer this option because once
> you unshelve offload an instance, it's no longer associated with a host and
> therefore no longer associated with an AZ. However, is it reasonable to
> assume that the user doesn't care that the instance, once unshelved, is no
> longer in the originally requested AZ? Probably not a safe assumption.

Agreed, unless we keep track that the user specified a default or no
AZ at create.

I think nulling the AZ when the original doesn't exist would be
reasonable from a user standpoint, but I'd feel handcuffed if that
happens and I can not select a new AZ. Or throwing a specific error
and letting the user handle it in #3 below:

> 3. When a user unshelves, they can't propose a new AZ (and I don't think we
> want to add that capability to the unshelve API). So if the original AZ is

Here is my question... if I can specify an AZ on create, why not on
unshelve?  Is it the image location movement under the hood?

> gone, should we automatically remove the RequestSpec.availability_zone when
> scheduling? I tend to not like this as it's very implicit and the user could
> see the AZ on their instance change before and after unshelve and be
> confused.

Agreed that explicit is better than implicit.

> 4. We could simply do nothing about this specific bug and assert the
> behavior is correct. The user requested an instance in a specific AZ,
> shelved that instance and when they wanted to unshelve it, it's no longer
> available so it fails. The user would have to delete the instance and create
> a new instance from the shelve snapshot image in a new AZ. If we implemented

I do not have the list of things in my head that are preserved in
shelve/unshelve that would be lost in a recreate, but that's where my
worry would come.  Presumably that is why I shelved in the first place
rather than snapshotting the server and removing it.  Depends on the
cost models too, if I lose my grandfathered-in pricing by being forced
to recreate I amy be unhappy.


> Sylvain's spec in #1 above, maybe we don't have this problem going forward
> since you couldn't remove/delete an AZ when there are even shelved offloaded
> instances still tied to it.

As a user I probably do not mind this, as an operator I'd likely be unhappy.

dt

-- 

Dean Troyer
dtroyer at gmail.com



More information about the OpenStack-dev mailing list