[openstack-dev] [glance] Periodically checking Glance image files

Nikhil Komawar nik.komawar at gmail.com
Tue Sep 13 04:16:53 UTC 2016


Hey Sergio,


Glad to know that you're not having any feature related issues (to me
this is a good sign). Based on your answers, it makes sense to require a
reliability solution for backend data (or some sort of health monitoring
for the user data).


So, I wonder what your thoughts are for such an audit system. At a first
glance, this looks rather not scalable, at least if you plan to do the
audit on all of the active images. Consider a deployment trying to run
this for around 100-500K active image records. This will need to be run
in batches, thus completing the list of records and saying that you've
done a full audit of the active image -- is a NP-complete problem (new
images can be introduced, some images can be updated in the meantime, etc.)

The failure rate is low, so a random (sparse check) on the image data
won't help either. Would a cron job setup to do the audit for smaller
deployments work? May be we can look into some known cron solutions to
do the trick?


On 9/12/16 4:18 PM, Sergio A. de Carvalho Jr. wrote:
> Hi Nikhil,
>
> Thanks so much for you response.
>
> 1) No, this is a private cloud.
> 2) Glance v1 (this problem has manifested itself in one of our oldest
> deployments, which is running Icehouse).
> 3) No, location is not exposed.
> 4) Glance is setup with the filesystem backend drive, using a Gluster
> volume mounted on the host..
> 5.1) Images were in active state, even though the image file had zero
> bytes.
> 5.2) very low, it may have happened only twice in the last year.
>
> Even if the location is not exposed, there are a number of things that
> can happen to the actual images files after they've been uploaded to
> Glance, without Glance noticing, depending how reliable your storage
> backend is. That's why I thought, in some circumstances, it would be
> useful to have some sort of background service checking that image
> files haven't been corrupted or gone missing altogether.
>
> Sergio
>
>
> On Mon, Sep 12, 2016 at 7:27 PM, Nikhil Komawar <nik.komawar at gmail.com
> <mailto:nik.komawar at gmail.com>> wrote:
>
>
>     Hi Sergio,
>
>     Thanks for reaching out. And this is an excellent question.
>
>     Firstly, I'd like to mention that Glance is built-in (and if deployed
>     correctly) is self-resilient in ensuring that you do NOT need an audit
>     of such files. In fact, if any operator (particularly large scale
>     operator) needs such a system we have a serious issue where
>     potentially
>     important /user/ data is likely to be lost resulting in legal
>     issues (so
>     please beware).
>
>     Having said that, I'd like to start investigating more into your
>     particular issue and see where we may be missing out in ensuring data
>     integrity in Glance. Let me ask you a first few set of questions that
>     will help us get an initial understanding:-
>
>     1) Are you a public cloud vendor; in particular, have you deployed
>     glance to potentially non-trusted users? or is the case otherwise?
>     2) Are you deploying Glance v1 or Glance v2?
>     3) Have you exposed the "location" feature set (CRUD) to regular
>     users?
>     (if using API v2, have you enabled ``show_multiple_locations``
>     configuration)
>     4) What backends have you configured Glance with and who has access to
>     them? What is the resiliency or rotation (of disks) (for say capacity
>     management) of your backend store system?
>     5) Sanity check on your issue:-
>     5.1) What are the image statues for which the image data files are
>     missing?
>     5.2) What is the rate of error approximately (if you don't have
>     specifics, info like rare, medium, often will help)
>
>
>     We may have to dig a bit further into the issue but this set of info
>     should help us narrow down the issue and determine if there are
>     any gaps
>     in Glance.
>
>     P.S. Please use the tag "[glance]" in the subject line to help us
>     get to
>     your email faster.
>
>     On 9/12/16 12:48 PM, Sergio A. de Carvalho Jr. wrote:
>     > Hi all,
>     >
>     > Is there (or was there ever) any plans to implement in Glance a
>     > service that would periodically check that the image files are still
>     > available on the file system (or in whatever storage system being
>     > used) and have the correct checksum?
>     >
>     > We had a few issues where an image file was removed from the
>     > filesystem and that can go undetected for a long time until someone
>     > tries to access that image, so we were wondering if it would be
>     > possible (and if it would make sense) to implement some sort of
>     > background service to periodically check if all images found in the
>     > database can be retrieved successfully.
>     >
>     > Thoughts?
>     >
>     > Sergio
>     >
>     >
>     >
>     >
>     >
>     __________________________________________________________________________
>     > OpenStack Development Mailing List (not for usage questions)
>     > Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     >
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>     >
>     >
>
>
>     --
>
>     Thanks,
>     Nikhil
>
>
>

-- 

Thanks,
Nikhil




More information about the OpenStack-dev mailing list