[oslo][cinder] Lots of leftover files in /var/lib/cinder
Ben Nemec
openstack at nemebean.com
Tue Mar 26 18:23:46 UTC 2019
On 3/26/19 11:27 AM, Herve Beraud wrote:
> Hello,
>
> By starting this thread I want to discuss about a knowed issuewho impact
> several openstack projects.
>
> Projects who use oslo.concurrency lockutils to lock process have several
> leftover files
> who was not automatically removed.
>
> You can find a related issue on the Red Hat Bugzilla[1].
This bug doesn't seem to be public. Fortunately it is a pretty
well-known thing so I doubt the interested parties need it. :-)
>
> It's not really an oslo.concurrency issue it's a knowed fasteners
> issue[2] not fixed yet on the fasteners side but with some related
> changes[3] under review currently.
Here's the thing: Somebody reports this "bug" about once every six
months, but I have yet to see a report where anything is actually
breaking. In my experience it is exclusively a cosmetic thing.
Furthermore, past attempts to fix this behavior have always resulted in
actual problems because it turns out that interprocess locking on Linux
is a bit of a disaster. I've become rather hesitant to mess with this
code over the years because of all the edge cases we keep running
across. For example, looking at the proposed fixes in fasteners, I can
tell you the lack of Windows support for offset locks is an issue. We
can obviously fall back to file locks there, but it's one more code path
to maintain, and one that is untested in the gate at that. I'm also
curious if Victor's O_TMPFILE option works on NFS because I know we ran
into an issue with that in the past too.
So I guess what I'm saying is that "fixing" this "problem" is trickier
than it might appear and I'm dubious of the value.
>
> oslo.concurrency already provide a work around[4] that all projects can
> use to fix that temporarely by waiting that the official fasteners fix
> will be released.
>
> I'm voluntary to help peoples and projects to use the oslo.concurrency
> cleaning method but I'm not sure where I need to put the changes (refer
> to [1]) outside the oslo scope.
>
> Also I guess other projects (nova, etc...) have the same issue.
I believe Nova was actually the original consumer of the
remove_external_lock API:
https://review.openstack.org/#/c/144891/1/nova/virt/libvirt/imagecache.py
It's possible we could do something similar for Cinder, but I have to
admit that at first glance the locking strategy there doesn't quite make
sense to me. Apparently each operation on a volume gets its own lock?
That seems like it opens up the possibility of one process trying to
update a volume while another deletes it.
>
> I need help from the expert of these projects to really know where we
> need to put changes (using oslo.concurrency
> remove_external_lock_file_with_prefix).
This is part of the problem. You have to be very careful about removing
lock files (note that this would apply to offset locks too) because if
there's any chance another process would still try to use it you may
create a race condition. Some lock files just can't be removed safely.
I know at one point we had looked into deleting lock files instead of
unlocking them, on the assumption that then any waiting locks would just
fight over who gets to re-create the file. I don't think it actually
worked though. Maybe the waiting locks didn't recognize that the file
had gone away and waited indefinitely?
My memory is pretty hazy though so it might be something to investigate
again (and write down the results this time ;-).
>
> Else if projects want to intodruces these changes I can help them by
> double checking with my oslo hat.
>
> Also I guess some projects reimplement the same approach that the
> oslo.concurrency module to lock process by using directly fasteners, in
> that case I thing they need to use oslo.concurrency to avoid the problem
> too.
>
> Do not hesitate to reply on this thread to trace useful informations and
> to add me on project reviews if you decide to introduce these changes on
> your side.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1647469
> [2] https://github.com/harlowja/fasteners/issues/26
> [3] https://github.com/harlowja/fasteners/pull/10
> [4]
> https://docs.openstack.org/oslo.concurrency/latest/reference/lockutils.html#oslo_concurrency.lockutils.remove_external_lock_file_with_prefix
>
> Thank you for your attention.
> --
> Hervé Beraud
> Senior Software Engineer
> Red Hat - Openstack Oslo
> irc: hberaud
> -----BEGIN PGP SIGNATURE-----
>
> wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
> Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
> RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
> F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
> 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
> glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
> m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
> hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
> qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
> F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
> B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
> v6rDpkeNksZ9fFSyoY2o
> =ECSj
> -----END PGP SIGNATURE-----
>
More information about the openstack-discuss
mailing list