[Manila] CephFS deferred deletion

12 Jul 2019

      Dear all,

Lately, one of our clients stored 300k files in a manila cephfs share.
Then he deleted the share in Manila. This event make the driver
unresponsive for several hours until all the data was removed in the
cluster.

We had a quick look at the code in manila [1] and the deletion is done
first by calling the following api calls in the ceph bindings
(delete_volume[1] and then purge_volume[2]). The first call moves the
directory to a volumes_deleted directory. The second call does a
deletion in depth of all the contents of that directory.

The last operation is the one that trigger the issue.

We had a similar issue in the past in Cinder. There, Arne proposed to
do a deferred deletion of volumes. I think we could do the same in
Manila for the cephfs driver.

The idea is to continue to call to the delete_volume. And then inside a
periodic task in the driver, asynchronously it will get the contents of
that directory and trigger the purge command.

I can propose the change and contribute with the code, but before going
to deep I would like to know if there is a reason of having a singleton
for the volume_client connection. If I compare with cinder code the
connection is established and closed in each operation with the
backend.

If you are not the maintainer, could you please point me to he/she?
I can post it in the mailing list if you prefer

Cheers
Jose Castro Leon
CERN Cloud Infrastructure

[1]
https://github.com/openstack/manila/blob/master/manila/share/drivers/cephfs/...

[2]
https://github.com/ceph/ceph/blob/master/src/pybind/ceph_volume_client.py#L7...

[2]
https://github.com/ceph/ceph/blob/master/src/pybind/ceph_volume_client.py#L7...

PS: The issue was triggered by one of our clients in kubernetes using
the Manila CSI driver