[Manila] CephFS deferred deletion

Tom Barron tpb at dyncloud.net
Fri Jul 12 13:15:40 UTC 2019


On 12/07/19 13:03 +0000, Jose Castro Leon wrote:
>Dear all,
>
>Lately, one of our clients stored 300k files in a manila cephfs share.
>Then he deleted the share in Manila. This event make the driver
>unresponsive for several hours until all the data was removed in the
>cluster.
>
>We had a quick look at the code in manila [1] and the deletion is done
>first by calling the following api calls in the ceph bindings
>(delete_volume[1] and then purge_volume[2]). The first call moves the
>directory to a volumes_deleted directory. The second call does a
>deletion in depth of all the contents of that directory.
>
>The last operation is the one that trigger the issue.
>
>We had a similar issue in the past in Cinder. There, Arne proposed to
>do a deferred deletion of volumes. I think we could do the same in
>Manila for the cephfs driver.
>
>The idea is to continue to call to the delete_volume. And then inside a
>periodic task in the driver, asynchronously it will get the contents of
>that directory and trigger the purge command.
>
>I can propose the change and contribute with the code, but before going
>to deep I would like to know if there is a reason of having a singleton
>for the volume_client connection. If I compare with cinder code the
>connection is established and closed in each operation with the
>backend.
>
>If you are not the maintainer, could you please point me to he/she?
>I can post it in the mailing list if you prefer
>
>Cheers
>Jose Castro Leon
>CERN Cloud Infrastructure
>
>[1]
>https://github.com/openstack/manila/blob/master/manila/share/drivers/cephfs/driver.py#L260-L267
>
>
>[2]
>https://github.com/ceph/ceph/blob/master/src/pybind/ceph_volume_client.py#L700-L734
>
>
>[2]
>https://github.com/ceph/ceph/blob/master/src/pybind/ceph_volume_client.py#L736-L790
>
>
>PS: The issue was triggered by one of our clients in kubernetes using
>the Manila CSI driver

Hi Jose,

Let's get this fixed since there's a lot of interest in Manila CSI 
driver and I think we can expect more batched deletes with it than we 
have had historically.

I've copied Ramana Raja and Patrick Donnelly since they will be able 
to answer your question about the singleton volume_client connection 
more authoritatively than I can.

Thanks for volunteering to propose a review to deal with this issue!

-- Tom Barron




More information about the openstack-discuss mailing list