[openstack-dev] [nova] Should we make rebuild + new image on a volume-backed instance fail fast?

Ben Nemec openstack at nemebean.com
Fri Oct 6 19:01:37 UTC 2017

On 10/06/2017 12:22 PM, Matt Riedemann wrote:
> This came up in IRC discussion the other day, but we didn't dig into it 
> much given we were all (2 of us) exhausted talking about rebuild.
> But we have had several bugs over the years where people expect the root 
> disk to change to a newly supplied image during rebuild even if the 
> instance is volume-backed.
> I distilled several of those bugs down to just this one and duplicated 
> the rest:
> https://bugs.launchpad.net/nova/+bug/1482040
> I wanted to see if there is actually any failure on the backend when 
> doing this, and there isn't - there is no instance fault or anything 
> like that. It's just not what the user expects, and actually the 
> instance image_ref is then shown later as the image specified during 
> rebuild, even though that's not the actual image in the root disk (the 
> volume).
> There have been a couple of patches proposed over time to change this:
> https://review.openstack.org/#/c/305079/
> https://review.openstack.org/#/c/201458/
> https://review.openstack.org/#/c/467588/
> And Paul Murray had a related (approved) spec at one point for detach 
> and attach of root volumes:
> https://review.openstack.org/#/c/221732/
> But the blueprint was never completed.
> So with all of this in mind, should we at least consider, until at least 
> someone owns supporting this, that the API should fail with a 400 
> response if you're trying to rebuild with a new image on a volume-backed 
> instance? That way it's a fast failure in the API, similar to trying to 
> backup a volume-backed instance fails fast.
> If we did, that would change the API response from a 202 today to a 400, 
> which is something we normally don't do. I don't think a microversion 
> would be necessary if we did this, however, because essentially what the 
> user is asking for isn't what we're actually giving them, so it's a 
> failure in an unexpected way even if there is no fault recorded, it's 
> not what the user asked for. I might not be thinking of something here 
> though, like interoperability for example - a cloud without this change 
> would blissfully return 202 but a cloud with the change would return a 
> 400...so that should be considered.

As a user who has been bitten by this behavior in the past, +1.  Yeah, 
it's technically an API change, but I think there's a strong argument 
that what the API is returning now is wrong.

More information about the OpenStack-dev mailing list