[openstack-dev] [glance] Periodically checking Glance image files
Nikhil Komawar
nik.komawar at gmail.com
Tue Sep 13 04:16:53 UTC 2016
Hey Sergio,
Glad to know that you're not having any feature related issues (to me
this is a good sign). Based on your answers, it makes sense to require a
reliability solution for backend data (or some sort of health monitoring
for the user data).
So, I wonder what your thoughts are for such an audit system. At a first
glance, this looks rather not scalable, at least if you plan to do the
audit on all of the active images. Consider a deployment trying to run
this for around 100-500K active image records. This will need to be run
in batches, thus completing the list of records and saying that you've
done a full audit of the active image -- is a NP-complete problem (new
images can be introduced, some images can be updated in the meantime, etc.)
The failure rate is low, so a random (sparse check) on the image data
won't help either. Would a cron job setup to do the audit for smaller
deployments work? May be we can look into some known cron solutions to
do the trick?
On 9/12/16 4:18 PM, Sergio A. de Carvalho Jr. wrote:
> Hi Nikhil,
>
> Thanks so much for you response.
>
> 1) No, this is a private cloud.
> 2) Glance v1 (this problem has manifested itself in one of our oldest
> deployments, which is running Icehouse).
> 3) No, location is not exposed.
> 4) Glance is setup with the filesystem backend drive, using a Gluster
> volume mounted on the host..
> 5.1) Images were in active state, even though the image file had zero
> bytes.
> 5.2) very low, it may have happened only twice in the last year.
>
> Even if the location is not exposed, there are a number of things that
> can happen to the actual images files after they've been uploaded to
> Glance, without Glance noticing, depending how reliable your storage
> backend is. That's why I thought, in some circumstances, it would be
> useful to have some sort of background service checking that image
> files haven't been corrupted or gone missing altogether.
>
> Sergio
>
>
> On Mon, Sep 12, 2016 at 7:27 PM, Nikhil Komawar <nik.komawar at gmail.com
> <mailto:nik.komawar at gmail.com>> wrote:
>
>
> Hi Sergio,
>
> Thanks for reaching out. And this is an excellent question.
>
> Firstly, I'd like to mention that Glance is built-in (and if deployed
> correctly) is self-resilient in ensuring that you do NOT need an audit
> of such files. In fact, if any operator (particularly large scale
> operator) needs such a system we have a serious issue where
> potentially
> important /user/ data is likely to be lost resulting in legal
> issues (so
> please beware).
>
> Having said that, I'd like to start investigating more into your
> particular issue and see where we may be missing out in ensuring data
> integrity in Glance. Let me ask you a first few set of questions that
> will help us get an initial understanding:-
>
> 1) Are you a public cloud vendor; in particular, have you deployed
> glance to potentially non-trusted users? or is the case otherwise?
> 2) Are you deploying Glance v1 or Glance v2?
> 3) Have you exposed the "location" feature set (CRUD) to regular
> users?
> (if using API v2, have you enabled ``show_multiple_locations``
> configuration)
> 4) What backends have you configured Glance with and who has access to
> them? What is the resiliency or rotation (of disks) (for say capacity
> management) of your backend store system?
> 5) Sanity check on your issue:-
> 5.1) What are the image statues for which the image data files are
> missing?
> 5.2) What is the rate of error approximately (if you don't have
> specifics, info like rare, medium, often will help)
>
>
> We may have to dig a bit further into the issue but this set of info
> should help us narrow down the issue and determine if there are
> any gaps
> in Glance.
>
> P.S. Please use the tag "[glance]" in the subject line to help us
> get to
> your email faster.
>
> On 9/12/16 12:48 PM, Sergio A. de Carvalho Jr. wrote:
> > Hi all,
> >
> > Is there (or was there ever) any plans to implement in Glance a
> > service that would periodically check that the image files are still
> > available on the file system (or in whatever storage system being
> > used) and have the correct checksum?
> >
> > We had a few issues where an image file was removed from the
> > filesystem and that can go undetected for a long time until someone
> > tries to access that image, so we were wondering if it would be
> > possible (and if it would make sense) to implement some sort of
> > background service to periodically check if all images found in the
> > database can be retrieved successfully.
> >
> > Thoughts?
> >
> > Sergio
> >
> >
> >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> >
> >
>
>
> --
>
> Thanks,
> Nikhil
>
>
>
--
Thanks,
Nikhil
More information about the OpenStack-dev
mailing list