[openstack-dev] [nova] Interesting bug when unshelving an instance in an AZ and the AZ is gone

Chris Friesen chris.friesen at windriver.com
Mon Oct 16 20:49:26 UTC 2017

On 10/16/2017 09:22 AM, Matt Riedemann wrote:

> 2. Should we null out the instance.availability_zone when it's shelved offloaded
> like we do for the instance.host and instance.node attributes? Similarly, we
> would not take into account the RequestSpec.availability_zone when scheduling
> during unshelve. I tend to prefer this option because once you unshelve offload
> an instance, it's no longer associated with a host and therefore no longer
> associated with an AZ.

This statement isn't true in the case where the user specifically requested a 
non-default AZ at boot time.

> However, is it reasonable to assume that the user doesn't
> care that the instance, once unshelved, is no longer in the originally requested
> AZ? Probably not a safe assumption.

If they didn't request a non-default AZ then I think we could remove it.

> 3. When a user unshelves, they can't propose a new AZ (and I don't think we want
> to add that capability to the unshelve API). So if the original AZ is gone,
> should we automatically remove the RequestSpec.availability_zone when
> scheduling? I tend to not like this as it's very implicit and the user could see
> the AZ on their instance change before and after unshelve and be confused.

I think allowing the user to specify an AZ on unshelve might be a reasonable 
option.  Or maybe just allow modifying the AZ of a shelved instance without 
unshelving it via a PUT on /servers/{server_id}.

> 4. We could simply do nothing about this specific bug and assert the behavior is
> correct. The user requested an instance in a specific AZ, shelved that instance
> and when they wanted to unshelve it, it's no longer available so it fails. The
> user would have to delete the instance and create a new instance from the shelve
> snapshot image in a new AZ.

I'm inclined to feel that this is operator error.  If they want to delete an AZ 
that has shelved instances then they should talk with their customers and update 
the stored AZ in the DB to a new "valid" one.  (Though currently this would 
require manual DB operations.)

If we implemented Sylvain's spec in #1 above, maybe
> we don't have this problem going forward since you couldn't remove/delete an AZ
> when there are even shelved offloaded instances still tied to it.

I kind of think it would be okay to disallow deleting AZs with shelved instances 
in them.


More information about the OpenStack-dev mailing list