[openstack-dev] [nova] Should we make rebuild + new image on a volume-backed instance fail fast?
Ben Nemec
openstack at nemebean.com
Fri Oct 6 19:01:37 UTC 2017
On 10/06/2017 12:22 PM, Matt Riedemann wrote:
> This came up in IRC discussion the other day, but we didn't dig into it
> much given we were all (2 of us) exhausted talking about rebuild.
>
> But we have had several bugs over the years where people expect the root
> disk to change to a newly supplied image during rebuild even if the
> instance is volume-backed.
>
> I distilled several of those bugs down to just this one and duplicated
> the rest:
>
> https://bugs.launchpad.net/nova/+bug/1482040
>
> I wanted to see if there is actually any failure on the backend when
> doing this, and there isn't - there is no instance fault or anything
> like that. It's just not what the user expects, and actually the
> instance image_ref is then shown later as the image specified during
> rebuild, even though that's not the actual image in the root disk (the
> volume).
>
> There have been a couple of patches proposed over time to change this:
>
> https://review.openstack.org/#/c/305079/
>
> https://review.openstack.org/#/c/201458/
>
> https://review.openstack.org/#/c/467588/
>
> And Paul Murray had a related (approved) spec at one point for detach
> and attach of root volumes:
>
> https://review.openstack.org/#/c/221732/
>
> But the blueprint was never completed.
>
> So with all of this in mind, should we at least consider, until at least
> someone owns supporting this, that the API should fail with a 400
> response if you're trying to rebuild with a new image on a volume-backed
> instance? That way it's a fast failure in the API, similar to trying to
> backup a volume-backed instance fails fast.
>
> If we did, that would change the API response from a 202 today to a 400,
> which is something we normally don't do. I don't think a microversion
> would be necessary if we did this, however, because essentially what the
> user is asking for isn't what we're actually giving them, so it's a
> failure in an unexpected way even if there is no fault recorded, it's
> not what the user asked for. I might not be thinking of something here
> though, like interoperability for example - a cloud without this change
> would blissfully return 202 but a cloud with the change would return a
> 400...so that should be considered.
>
As a user who has been bitten by this behavior in the past, +1. Yeah,
it's technically an API change, but I think there's a strong argument
that what the API is returning now is wrong.
More information about the OpenStack-dev
mailing list