[openstack-dev] [cinder][nova] proper syncing of cinder volume state

John Griffith john.griffith8 at gmail.com
Mon Dec 1 04:30:01 UTC 2014

On Fri, Nov 28, 2014 at 11:25 AM, D'Angelo, Scott <scott.dangelo at hp.com> wrote:
> A Cinder blueprint has been submitted to allow the python-cinderclient to
> involve the back end storage driver in resetting the state of a cinder
> volume:
> https://blueprints.launchpad.net/cinder/+spec/reset-state-with-driver
> and the spec:
> https://review.openstack.org/#/c/134366
> This blueprint contains various use cases for a volume that may be listed in
> the Cinder DataBase in state detaching|attaching|creating|deleting.
> The Proposed solution involves augmenting the python-cinderclient command
> ‘reset-state’, but other options are listed, including those that
> involve Nova, since the state of a volume in the Nova XML found in
> /etc/libvirt/qemu/<instance_id>.xml may also be out-of-sync with the
> Cinder DB or storage back end.
> A related proposal for adding a new non-admin API for changing volume status
> from ‘attaching’ to ‘error’ has also been proposed:
> https://review.openstack.org/#/c/137503/
> Some questions have arisen:
> 1) Should ‘reset-state’ command be changed at all, since it was originally
> just to modify the Cinder DB?
> 2) Should ‘reset-state’ be fixed to prevent the naïve admin from changing
> the CinderDB to be out-of-sync with the back end storage?
> 3) Should ‘reset-state’ be kept the same, but augmented with new options?
> 4) Should a new command be implemented, with possibly a new admin API to
> properly sync state?
> 5) Should Nova be involved? If so, should this be done as a separate body of
> work?
> This has proven to be a complex issue and there seems to be a good bit of
> interest. Please provide feedback, comments, and suggestions.
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Hey Scott,

Thanks for posting this to the ML, I stated my opinion on the spec,
but for completeness:
My feeling is that reset-state has morphed into something entirely
different than originally intended.  That's actually great, nothing
wrong there at all.  I strongly disagree with the statements that
"setting the status in the DB only is almost always the wrong thing to
do".  The whole point was to allow the state to be changed in the DB
so the item could in most cases be deleted.  There was never an intent
(that I'm aware of) to make this some sort of uber resync and heal API

All of that history aside, I think it would be great to add some
driver interaction here.  I am however very unclear on what that would
actually include.  For example, would you let a Volume's state be
changed from "Error-Attaching" to "In-Use" and just run through the
process of retyring an attach?  To me that seems like a bad idea.  I'm
much happier with the current state of changing the state form "Error"
to "Available" (and NOTHING else) so that an operation can be retried,
or the resource can be deleted.  If you start allowing any state
transition (which sadly we've started to do) you're almost never going
to get things correct.  This also covers almost every situation even
though it means you have to explicitly retry operations or steps (I
don't think that's a bad thing) and make the code significantly more
robust IMO (we have some issues lately with things being robust).

My proposal would be to go back to limiting the things you can do with
reset-state (basicly make it so you can only release the resource back
to available) and add the driver interaction to clean up any mess if
possible.  This could be a simple driver call added like
"make_volume_available" whereby the driver just ensures that there are
no attachments and.... well; honestly nothing else comes to mind as
being something the driver cares about here. The final option then
being to add some more power to force-delete.

Is there anything other than attach that matters from a driver?  If
people are talking error-recovery that to me is a whole different
topic and frankly I think we need to spend more time preventing errors
as opposed to trying to recover from them via new API calls.

Curious to see if any other folks have input here?


More information about the OpenStack-dev mailing list