[oslo][cinder] Lots of leftover files in /var/lib/cinder

Ben Nemec openstack at nemebean.com
Tue Mar 26 18:23:46 UTC 2019



On 3/26/19 11:27 AM, Herve Beraud wrote:
> Hello,
> 
> By starting this thread I want to discuss about a knowed issuewho impact 
> several openstack projects.
> 
> Projects who use oslo.concurrency lockutils to lock process have several 
> leftover files
> who was not automatically removed.
> 
> You can find a related issue on the Red Hat Bugzilla[1].

This bug doesn't seem to be public. Fortunately it is a pretty 
well-known thing so I doubt the interested parties need it. :-)

> 
> It's not really an oslo.concurrency issue it's a knowed fasteners 
> issue[2] not fixed yet on the fasteners side but with some related 
> changes[3] under review currently.

Here's the thing: Somebody reports this "bug" about once every six 
months, but I have yet to see a report where anything is actually 
breaking. In my experience it is exclusively a cosmetic thing.

Furthermore, past attempts to fix this behavior have always resulted in 
actual problems because it turns out that interprocess locking on Linux 
is a bit of a disaster. I've become rather hesitant to mess with this 
code over the years because of all the edge cases we keep running 
across. For example, looking at the proposed fixes in fasteners, I can 
tell you the lack of Windows support for offset locks is an issue. We 
can obviously fall back to file locks there, but it's one more code path 
to maintain, and one that is untested in the gate at that. I'm also 
curious if Victor's O_TMPFILE option works on NFS because I know we ran 
into an issue with that in the past too.

So I guess what I'm saying is that "fixing" this "problem" is trickier 
than it might appear and I'm dubious of the value.

> 
> oslo.concurrency already provide a work around[4] that all projects can 
> use to fix that temporarely by waiting that the official fasteners fix 
> will be released.
> 
> I'm voluntary to help peoples and projects to use the oslo.concurrency 
> cleaning method but I'm not sure where I need to put the changes (refer 
> to [1]) outside the oslo scope.
> 
> Also I guess other projects (nova, etc...) have the same issue.

I believe Nova was actually the original consumer of the 
remove_external_lock API: 
https://review.openstack.org/#/c/144891/1/nova/virt/libvirt/imagecache.py

It's possible we could do something similar for Cinder, but I have to 
admit that at first glance the locking strategy there doesn't quite make 
sense to me. Apparently each operation on a volume gets its own lock? 
That seems like it opens up the possibility of one process trying to 
update a volume while another deletes it.

> 
> I need help from the expert of these projects to really know where we 
> need to put changes (using oslo.concurrency 
> remove_external_lock_file_with_prefix).

This is part of the problem. You have to be very careful about removing 
lock files (note that this would apply to offset locks too) because if 
there's any chance another process would still try to use it you may 
create a race condition. Some lock files just can't be removed safely.

I know at one point we had looked into deleting lock files instead of 
unlocking them, on the assumption that then any waiting locks would just 
fight over who gets to re-create the file. I don't think it actually 
worked though. Maybe the waiting locks didn't recognize that the file 
had gone away and waited indefinitely?

My memory is pretty hazy though so it might be something to investigate 
again (and write down the results this time ;-).

> 
> Else if projects want to intodruces these changes I can help them by 
> double checking with my oslo hat.
> 
> Also I guess some projects reimplement the same approach that the 
> oslo.concurrency module to lock process by using directly fasteners, in 
> that case I thing they need to use oslo.concurrency to avoid the problem 
> too.
> 
> Do not hesitate to reply on this thread to trace useful informations and 
> to add me on project reviews if you decide to introduce these changes on 
> your side.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1647469
> [2] https://github.com/harlowja/fasteners/issues/26
> [3] https://github.com/harlowja/fasteners/pull/10
> [4] 
> https://docs.openstack.org/oslo.concurrency/latest/reference/lockutils.html#oslo_concurrency.lockutils.remove_external_lock_file_with_prefix
> 
> Thank you for your attention.
> -- 
> Hervé Beraud
> Senior Software Engineer
> Red Hat - Openstack Oslo
> irc: hberaud
> -----BEGIN PGP SIGNATURE-----
> 
> wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+
> Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+
> RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP
> F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G
> 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g
> glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw
> m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ
> hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0
> qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y
> F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3
> B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O
> v6rDpkeNksZ9fFSyoY2o
> =ECSj
> -----END PGP SIGNATURE-----
> 



More information about the openstack-discuss mailing list