[Openstack-operators] [nova] Should we make rebuild + new image on a volume-backed instance fail fast?

Matt Riedemann mriedemos at gmail.com
Fri Oct 6 17:22:43 UTC 2017


This came up in IRC discussion the other day, but we didn't dig into it 
much given we were all (2 of us) exhausted talking about rebuild.

But we have had several bugs over the years where people expect the root 
disk to change to a newly supplied image during rebuild even if the 
instance is volume-backed.

I distilled several of those bugs down to just this one and duplicated 
the rest:

https://bugs.launchpad.net/nova/+bug/1482040

I wanted to see if there is actually any failure on the backend when 
doing this, and there isn't - there is no instance fault or anything 
like that. It's just not what the user expects, and actually the 
instance image_ref is then shown later as the image specified during 
rebuild, even though that's not the actual image in the root disk (the 
volume).

There have been a couple of patches proposed over time to change this:

https://review.openstack.org/#/c/305079/

https://review.openstack.org/#/c/201458/

https://review.openstack.org/#/c/467588/

And Paul Murray had a related (approved) spec at one point for detach 
and attach of root volumes:

https://review.openstack.org/#/c/221732/

But the blueprint was never completed.

So with all of this in mind, should we at least consider, until at least 
someone owns supporting this, that the API should fail with a 400 
response if you're trying to rebuild with a new image on a volume-backed 
instance? That way it's a fast failure in the API, similar to trying to 
backup a volume-backed instance fails fast.

If we did, that would change the API response from a 202 today to a 400, 
which is something we normally don't do. I don't think a microversion 
would be necessary if we did this, however, because essentially what the 
user is asking for isn't what we're actually giving them, so it's a 
failure in an unexpected way even if there is no fault recorded, it's 
not what the user asked for. I might not be thinking of something here 
though, like interoperability for example - a cloud without this change 
would blissfully return 202 but a cloud with the change would return a 
400...so that should be considered.

-- 

Thanks,

Matt



More information about the OpenStack-operators mailing list