[openstack-dev] [cinder][drivers] Backend and volume health reporting

John Griffith john.griffith8 at gmail.com
Sun Aug 14 14:53:16 UTC 2016


On Sun, Aug 14, 2016 at 2:11 AM, Avishay Traeger <avishay at stratoscale.com>
wrote:

> Hi all,
> I would like to propose working on a new feature for Ocata to provide
> health information for Cinder backends and volumes.  Currently, a volume's
> status basically reflects the last management operation performed on it -
> it will be in error state only as a result of a failed management
> operation.  There is no indication as to whether or not a backend or volume
> is "healthy" - i.e., the data exists and is accessible.
>
> The basic idea would be to add a "health" property for both backends and
> volumes.
>
> For backends, this may be something like:
> - "healthy"
> - "warning" (something is wrong and the admin should check the storage)
> - "management unavailable" (there is no management connectivity)
> - "data unavailable" (there is no data path connectivity)
>
> For volumes:
> - "healthy"
> - "degraded" (i.e., not at full redundancy)
> - "error" (in case of a data loss event)
> - "management unavailable" (there is no management connectivity)
> - "data unavailable" (there is no data path connectivity)
>
> Before I start working on a spec, I wanted to get some feedback,
> especially from driver owners:
> 1. What useful information can you provide at the backend level?
> 2. And at the volume level?
> 3. How would you obtain this information?  Querying the storage (poll)?
> Registering for events?  Something else?
> 4. Other feedback?
>
> Thank you,
> Avishay
>
> --
> *Avishay Traeger, PhD*
> *System Architect*
>
> Mobile: +972 54 447 1475
> E-mail: avishay at stratoscale.com
>
>
>
> Web <http://www.stratoscale.com/> | Blog
> <http://www.stratoscale.com/blog/> | Twitter
> <https://twitter.com/Stratoscale> | Google+
> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
>  | Linkedin <https://www.linkedin.com/company/stratoscale>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> ​I'd like to get a more detailed use case and example of a problem you
want to solve with this.  I have a number of concerns including those I
raised in your "list manageable volumes" proposal.​  Most importantly
there's really no clear definition of what these fields mean and how they
should be interpreted.

For backends, I'm not sure what you want to solve that can't be handled
already by the scheduler and report-capabilities periodic job?  You can
already report back from your backend to the scheduler that you shouldn't
be used for any scheduling activities going forward.  More detailed info
than that might be useful, but I'm not sure it wouldn't fall into an
already existing OpenStack monitoring project like Monasca?

As far as volumes, I personally don't think volumes should have more than a
few states.  They're either "ok" and available for an operation or they're
not.  The list you have seems ok to me, but I don't see a ton of value in
fault prediction or going to great lengths to avoid something failing. The
current model we have of a volume being "ok" until it's "not" seems
perfectly reasonable to me.  Typically my experience is that trying to be
clever and polling/monitoring to try and preemptively change the status of
a volume does little more than result in complexity, confusion and false
status changes of resources.  I'm pretty strongly opposed to having a level
of granularity of the volume here.  At least for now, I'd rather see what
you have in mind for the backend and nail that down to something that's
solid and basically bullet proof before trying to tackle thousands of
volumes which have transient states.  And of course the biggest question I
have still "what problem" you hope to solve here?

Thanks,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160814/9dc3addf/attachment.html>


More information about the OpenStack-dev mailing list