[cinder][dev] Bug for deferred deletion in RBD

Arne Wiebalck arne.wiebalck at cern.ch
Tue Feb 12 06:55:57 UTC 2019


Jae,

One other setting that caused trouble when bulk deleting cinder volumes was the
DB connection string: we did not configure a driver and hence used the Python
mysql wrapper instead … essentially changing 

connection = mysql://cinder:<pw>@<host>:<port>/cinder 

to

connection = mysql+pymysql://cinder:<pw>@<host>:<port>/cinder 

solved the parallel deletion issue for us.

All details in the last paragraph of [1].

HTH!
 Arne

[1] https://techblog.web.cern.ch/techblog/post/experiences-with-cinder-in-production/



> On 12 Feb 2019, at 01:07, Jae Sang Lee <hyangii at gmail.com> wrote:
> 
> Hello, 
> 
> I tested today by increasing EVENTLET_THREADPOOL_SIZE size to 100. I wanted to have good results, 
> but this time I did not get a response after removing 41 volumes. This environment variable did not fix 
> the cinder-volume stopping.
> 
> Restarting the stopped cinder-volume will delete all volumes that are in deleting state while running the clean_up function. 
> Only one volume in the deleting state, I force the state of this volume to be available, and then delete it, all volumes will be deleted.
> 
> This result was the same for 3 consecutive times. After removing dozens of volumes, the cinder-volume was down, 
> and after the restart of the service, 199 volumes were deleted and one volume was manually erased.
> 
> If you have a different approach to solving this problem, please let me know.
> 
> Thanks.
> 
> 2019년 2월 11일 (월) 오후 9:40, Arne Wiebalck <Arne.Wiebalck at cern.ch>님이 작성:
> Jae,
> 
>> On 11 Feb 2019, at 11:39, Jae Sang Lee <hyangii at gmail.com> wrote:
>> 
>> Arne,
>> 
>> I saw the messages like ''moving volume to trash"  in the cinder-volume logs and the peridic task also reports 
>> like "Deleted <vol-uuid> from trash for backend '<backends-name>'"
>> 
>> The patch worked well when clearing a small number of volumes. This happens only when I am deleting a large 
>> number of volumes.
> 
> Hmm, from cinder’s point of view, the deletion should be more or less instantaneous, so it should be able to “delete”
> many more volumes before getting stuck.
> 
> The periodic task, however, will go through the volumes one by one, so if you delete many at the same time,
> volumes may pile up in the trash (for some time) before the tasks gets round to delete them. This should not affect
> c-vol, though.
> 
>> I will try to adjust the number of thread pools by adjusting the environment variables with your advices
>> 
>> Do you know why the cinder-volume hang does not occur when create a volume, but only when delete a volume?
> 
> Deleting a volume ties up a thread for the duration of the deletion (which is synchronous and can hence take very
> long for ). If you have too many deletions going on at the same time, you run out of threads and c-vol will eventually
> time out. FWIU, creation basically works the same way, but it is almost instantaneous, hence the risk of using up all
> threads is simply lower (Gorka may correct me here :-).
> 
> Cheers,
>  Arne
> 
>> 
>> 
>> Thanks.
>> 
>> 
>> 2019년 2월 11일 (월) 오후 6:14, Arne Wiebalck <Arne.Wiebalck at cern.ch>님이 작성:
>> Jae,
>> 
>> To make sure deferred deletion is properly working: when you delete individual large volumes
>> with data in them, do you see that
>> - the volume is fully “deleted" within a few seconds, ie. not staying in ‘deleting’ for a long time?
>> - that the volume shows up in trash (with “rbd trash ls”)?
>> - the periodic task reports it is deleting volumes from the trash?
>> 
>> Another option to look at is “backend_native_threads_pool_size": this will increase the number
>> of threads to work on deleting volumes. It is independent from deferred deletion, but can also
>> help with situations where Cinder has more work to do than it can cope with at the moment.
>> 
>> Cheers,
>>  Arne
>> 
>> 
>> 
>>> On 11 Feb 2019, at 09:47, Jae Sang Lee <hyangii at gmail.com> wrote:
>>> 
>>> Yes, I added your code to pike release manually.
>>> 
>>> 
>>> 
>>> 2019년 2월 11일 (월) 오후 4:39에 Arne Wiebalck <Arne.Wiebalck at cern.ch>님이 작성:
>>> Hi Jae,
>>> 
>>> You back ported the deferred deletion patch to Pike?
>>> 
>>> Cheers,
>>>  Arne
>>> 
>>> > On 11 Feb 2019, at 07:54, Jae Sang Lee <hyangii at gmail.com> wrote:
>>> > 
>>> > Hello,
>>> > 
>>> > I recently ran a volume deletion test with deferred deletion enabled on the pike release.
>>> > 
>>> > We experienced a cinder-volume hung when we were deleting a large amount of the volume in which the data was actually written(I make 15GB file in every volumes), and we thought deferred deletion would solve it.
>>> > 
>>> > However, while deleting 200 volumes, after 50 volumes, the cinder-volume downed as before. In my opinion, the trash_move api does not seem to work properly when removing multiple volumes, just like remove api.
>>> > 
>>> > If these test results are my fault, please let me know the correct test method.
>>> > 
>>> 
>>> --
>>> Arne Wiebalck
>>> CERN IT
>>> 
>> 
>> --
>> Arne Wiebalck
>> CERN IT
>> 
> 
> --
> Arne Wiebalck
> CERN IT
> 




More information about the openstack-discuss mailing list