[openstack-dev] [glance] Periodically checking Glance image files

Avishay Traeger avishay at stratoscale.com
Tue Sep 13 07:01:37 UTC 2016


On Tue, Sep 13, 2016 at 7:16 AM, Nikhil Komawar <nik.komawar at gmail.com>
wrote:
>     Firstly, I'd like to mention that Glance is built-in (and if deployed
>     correctly) is self-resilient in ensuring that you do NOT need an audit
>     of such files. In fact, if any operator (particularly large scale
>     operator) needs such a system we have a serious issue where
>     potentially
>     important /user/ data is likely to be lost resulting in legal
>     issues (so
>     please beware).

Can you please elaborate on how Glance is self-resilient?

Hey Sergio,
>
>
> Glad to know that you're not having any feature related issues (to me
> this is a good sign). Based on your answers, it makes sense to require a
> reliability solution for backend data (or some sort of health monitoring
> for the user data).
>

All backends will at some point lose some data.  The ask is for reflecting
the image's "health" to the user.


> So, I wonder what your thoughts are for such an audit system. At a first
> glance, this looks rather not scalable, at least if you plan to do the
> audit on all of the active images. Consider a deployment trying to run
> this for around 100-500K active image records. This will need to be run
> in batches, thus completing the list of records and saying that you've
> done a full audit of the active image -- is a NP-complete problem (new
> images can be introduced, some images can be updated in the meantime, etc.)
>

NP-complete?  Really?  Every storage system scrubs all data periodically to
protect from disk errors.  Glance images should be relatively static anyway.


> The failure rate is low, so a random (sparse check) on the image data
> won't help either. Would a cron job setup to do the audit for smaller
> deployments work? May be we can look into some known cron solutions to
> do the trick?
>

How about letting the backend report the health?  S3, for example, reports
an event on object loss
<http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#supported-notification-event-types>.
The S3 driver could monitor those events and update status.  Swift performs
scrubbing to determine object health - I haven't checked if it reports an
event on object loss, but don't see any reason not to.  For local
filesystem, it would need its own scrubbing process (e.g., recalculate hash
for each object every N days).  On the other hand if it is a mount of some
filer, the filer should be able to report on health.

Thanks,
Avishay

-- 
*Avishay Traeger, PhD*
*System Architect*

Mobile: +972 54 447 1475
E-mail: avishay at stratoscale.com



Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/>
 | Twitter <https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
 | Linkedin <https://www.linkedin.com/company/stratoscale>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160913/0129ba80/attachment.html>


More information about the OpenStack-dev mailing list