[Openstack-operators] [nova] Should we make rebuild + new image on a volume-backed instance fail fast?
Matt Riedemann
mriedemos at gmail.com
Fri Oct 6 17:22:43 UTC 2017
This came up in IRC discussion the other day, but we didn't dig into it
much given we were all (2 of us) exhausted talking about rebuild.
But we have had several bugs over the years where people expect the root
disk to change to a newly supplied image during rebuild even if the
instance is volume-backed.
I distilled several of those bugs down to just this one and duplicated
the rest:
https://bugs.launchpad.net/nova/+bug/1482040
I wanted to see if there is actually any failure on the backend when
doing this, and there isn't - there is no instance fault or anything
like that. It's just not what the user expects, and actually the
instance image_ref is then shown later as the image specified during
rebuild, even though that's not the actual image in the root disk (the
volume).
There have been a couple of patches proposed over time to change this:
https://review.openstack.org/#/c/305079/
https://review.openstack.org/#/c/201458/
https://review.openstack.org/#/c/467588/
And Paul Murray had a related (approved) spec at one point for detach
and attach of root volumes:
https://review.openstack.org/#/c/221732/
But the blueprint was never completed.
So with all of this in mind, should we at least consider, until at least
someone owns supporting this, that the API should fail with a 400
response if you're trying to rebuild with a new image on a volume-backed
instance? That way it's a fast failure in the API, similar to trying to
backup a volume-backed instance fails fast.
If we did, that would change the API response from a 202 today to a 400,
which is something we normally don't do. I don't think a microversion
would be necessary if we did this, however, because essentially what the
user is asking for isn't what we're actually giving them, so it's a
failure in an unexpected way even if there is no fault recorded, it's
not what the user asked for. I might not be thinking of something here
though, like interoperability for example - a cloud without this change
would blissfully return 202 but a cloud with the change would return a
400...so that should be considered.
--
Thanks,
Matt
More information about the OpenStack-operators
mailing list