[oslo][cinder] Lots of leftover files in /var/lib/cinder
geguileo at redhat.com
Fri Mar 29 09:11:52 UTC 2019
On 26/03, Ben Nemec wrote:
> On 3/26/19 11:27 AM, Herve Beraud wrote:
> > Hello,
> > By starting this thread I want to discuss about a knowed issuewho impact
> > several openstack projects.
> > Projects who use oslo.concurrency lockutils to lock process have several
> > leftover files
> > who was not automatically removed.
> > You can find a related issue on the Red Hat Bugzilla.
> This bug doesn't seem to be public. Fortunately it is a pretty well-known
> thing so I doubt the interested parties need it. :-)
> > It's not really an oslo.concurrency issue it's a knowed fasteners
> > issue not fixed yet on the fasteners side but with some related
> > changes under review currently.
> Here's the thing: Somebody reports this "bug" about once every six months,
> but I have yet to see a report where anything is actually breaking. In my
> experience it is exclusively a cosmetic thing.
> Furthermore, past attempts to fix this behavior have always resulted in
> actual problems because it turns out that interprocess locking on Linux is a
> bit of a disaster. I've become rather hesitant to mess with this code over
> the years because of all the edge cases we keep running across. For example,
> looking at the proposed fixes in fasteners, I can tell you the lack of
> Windows support for offset locks is an issue. We can obviously fall back to
> file locks there, but it's one more code path to maintain, and one that is
> untested in the gate at that. I'm also curious if Victor's O_TMPFILE option
> works on NFS because I know we ran into an issue with that in the past too.
> So I guess what I'm saying is that "fixing" this "problem" is trickier than
> it might appear and I'm dubious of the value.
> > oslo.concurrency already provide a work around that all projects can
> > use to fix that temporarely by waiting that the official fasteners fix
> > will be released.
> > I'm voluntary to help peoples and projects to use the oslo.concurrency
> > cleaning method but I'm not sure where I need to put the changes (refer
> > to ) outside the oslo scope.
> > Also I guess other projects (nova, etc...) have the same issue.
> I believe Nova was actually the original consumer of the
> remove_external_lock API:
> It's possible we could do something similar for Cinder, but I have to admit
> that at first glance the locking strategy there doesn't quite make sense to
> me. Apparently each operation on a volume gets its own lock? That seems like
> it opens up the possibility of one process trying to update a volume while
> another deletes it.
Minor clarification about Cinder and locks.
In Cinder we prevent undesired concurrent access using locks and volume
states using conditional DB changes.
In the case of locks, like you say, we have a lock per volume, and we
use the same lock on the appropriate methods.
For example in the delete_volume we have
And in the clone operation we construct the same lock for the source
locked_action = "%s-%s" % (source_volid, 'delete_volume')
And use it to run the creation flow:
> > I need help from the expert of these projects to really know where we
> > need to put changes (using oslo.concurrency
> > remove_external_lock_file_with_prefix).
> This is part of the problem. You have to be very careful about removing lock
> files (note that this would apply to offset locks too) because if there's
> any chance another process would still try to use it you may create a race
> condition. Some lock files just can't be removed safely.
> I know at one point we had looked into deleting lock files instead of
> unlocking them, on the assumption that then any waiting locks would just
> fight over who gets to re-create the file. I don't think it actually worked
> though. Maybe the waiting locks didn't recognize that the file had gone away
> and waited indefinitely?
> My memory is pretty hazy though so it might be something to investigate
> again (and write down the results this time ;-).
> > Else if projects want to intodruces these changes I can help them by
> > double checking with my oslo hat.
> > Also I guess some projects reimplement the same approach that the
> > oslo.concurrency module to lock process by using directly fasteners, in
> > that case I thing they need to use oslo.concurrency to avoid the problem
> > too.
> > Do not hesitate to reply on this thread to trace useful informations and
> > to add me on project reviews if you decide to introduce these changes on
> > your side.
> >  https://bugzilla.redhat.com/show_bug.cgi?id=1647469
> >  https://github.com/harlowja/fasteners/issues/26
> >  https://github.com/harlowja/fasteners/pull/10
> >  https://docs.openstack.org/oslo.concurrency/latest/reference/lockutils.html#oslo_concurrency.lockutils.remove_external_lock_file_with_prefix
> > Thank you for your attention.
> > --
> > Hervé Beraud
> > Senior Software Engineer
> > Red Hat - Openstack Oslo
> > irc: hberaud
> > -----BEGIN PGP SIGNATURE-----
> > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
> > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
> > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
> > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
> > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
> > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
> > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
> > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
> > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
> > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
> > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
> > v6rDpkeNksZ9fFSyoY2o
> > =ECSj
> > -----END PGP SIGNATURE-----
More information about the openstack-discuss