[nova] Updates about Detaching/Attaching root volumes

Dan Smith dms at danplanet.com
Fri Mar 1 00:02:57 UTC 2019

> We talked about this in the nova meeting today:
> http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-02-28-21.00.log.html#l-95
> It sounds like we might be leaning to option 4 for a couple of reasons:
> a) It's getting late in stein to be lifting the restriction on
> attaching a volume with a tag to a shelved offloaded server and
> munging that in with the root attach/detach volume API change.
> b) As we got talking I noted that for the same reason as a tag, you
> currently cannot attach a multi-attach volume to a shelved offloaded
> server, because we don't make the "reserve_block_device_name" RPC call
> to the compute service to determine if the driver supports it.
> c) Once we have code in the API to properly translate tags/multiattach
> requests to the scheduler (based on [1]) so that we know when we
> unshelve that we'll pick a host that supports device tags and
> multiattach volumes, we could lift the restriction in the API with a
> new microversion.
> If we go this route, it means the API reference for the root volume
> detach/attach change needs to mention that the tag will be reset and
> cannot be set again on attach of the root volume (at least until we
> lift that restriction).

So, first off, sorry I got pulled away from the meeting and am now
showing up here with alternatives.

My opinion is that it sucks to have to choose between "detaching my root
volume" and "can have tags".

I think I understand the reasoning for doing the RPC call when attaching
to a server on a host to make sure we can honor the tag you want, but I
think the shelved case is a bit weird and shouldn't really be held to
the same standard. It's always a valid outcome when trying to unshelve a
host to say "sorry, but there's no room for that anymore" or even
"sorry, you've had this shelved since our last hardware refresh and your
instance no longer fits with the group." Thus, attaching a volume with a
tag to a shelved instance making it either unbootable or silently
ignore-able seems like not the worst thing, especially if the user can
just detach and re-attach without a tag to get out of that situation.

Further, I also think that wiping the tag on detach is the wrong thing
to do. The point of tags for both volumes and network devices was to
indicate function, not content. Things like "this is my fast SSD-backed
database store" or "this is my slow rotating-rust backup disk" or "this
is the fast unmetered internal network." They don't really make sense to
say "this is ubuntu 16.04" because ... the image (metadata) tells you

Thus, I would expect someone with a root volume to use a tag like "root"
or "boot" or "where the virus lives". Detaching the root volume doesn't
likely change that function.

So, IMHO, we should select a path that does not make the user choose
between the "tags or detach" dichotomy above. Some options for that are:

1. Don't wipe the tag on detach. On re-attach, let them specify a tag if
there is already one set on the BDM. Clearly that means tags were
supported before, so a reasonable guess that they still are.

2. Specifically for the above case, either (a) require they pass in the
tag they want on the new volume if they want to keep/change it or (b)
assume it stays the same unless they specify a null tag to remove it.

3. Always let them specify a tag on attach, and if that means they're
not un-shelve-able because no compute nodes support tags, then that's
the same case as if you show up with tag-having boot request. Let them
de-and-re-attach the volume to wipe the tag and then unshelve.

4. Always let them specify a tag on attach and just not freak out on
unshelve if we end up picking a host with no tag support.

Personally, I think 2a or 2b are totally fine, but they might not be as
obvious as #3. Either will avoid the user having to make a hard life
decision about whether they're going to lose their volume tag forever
because they need to detach their root volume. Since someone doing such
an operation on an instance is probably caring for a loved pet, that's a
sucky position to be in, especially if you're reading the docs after
having done so only to realize you've already lost it forever when you
did your detach.


More information about the openstack-discuss mailing list