[cinder][dev] Bug for deferred deletion in RBD

Jae Sang Lee hyangii at gmail.com
Wed Feb 13 08:35:47 UTC 2019


As mentioned in Gorka, sql connection is using pymysql.

And I increased max_pool_size to 50(I think gorka mistaken max_pool_size to
max_retries.),
but it was the same that the cinder-volume stucked from the time that 4~50
volumes were deleted.

There seems to be a problem with the cinder rbd volume driver, so I tested
to delete 200 volumes continously
by used only RBDClient and RBDProxy. There was no problem at this time.

I think there is some code in the cinder-volume that causes a hang but it's
too hard to find now.

Thanks.

2019년 2월 12일 (화) 오후 6:24, Gorka Eguileor <geguileo at redhat.com>님이 작성:

> On 12/02, Arne Wiebalck wrote:
> > Jae,
> >
> > One other setting that caused trouble when bulk deleting cinder volumes
> was the
> > DB connection string: we did not configure a driver and hence used the
> Python
> > mysql wrapper instead … essentially changing
> >
> > connection = mysql://cinder:<pw>@<host>:<port>/cinder
> >
> > to
> >
> > connection = mysql+pymysql://cinder:<pw>@<host>:<port>/cinder
> >
> > solved the parallel deletion issue for us.
> >
> > All details in the last paragraph of [1].
> >
> > HTH!
> >  Arne
> >
> > [1]
> https://techblog.web.cern.ch/techblog/post/experiences-with-cinder-in-production/
> >
>
> Good point, using a C mysql connection library will induce thread
> starvation.  This was thoroughly discussed, and the default changed,
> like 2 years ago...  So I assumed we all changed that.
>
> Something else that could be problematic when receiving many concurrent
> requests on any Cinder service is the number of concurrent DB
> connections, although we also changed this a while back to 50.  This is
> set as sql_max_retries or max_retries (depending on the version) in the
> "[database]" section.
>
> Cheers,
> Gorka.
>
>
> >
> >
> > > On 12 Feb 2019, at 01:07, Jae Sang Lee <hyangii at gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > I tested today by increasing EVENTLET_THREADPOOL_SIZE size to 100. I
> wanted to have good results,
> > > but this time I did not get a response after removing 41 volumes. This
> environment variable did not fix
> > > the cinder-volume stopping.
> > >
> > > Restarting the stopped cinder-volume will delete all volumes that are
> in deleting state while running the clean_up function.
> > > Only one volume in the deleting state, I force the state of this
> volume to be available, and then delete it, all volumes will be deleted.
> > >
> > > This result was the same for 3 consecutive times. After removing
> dozens of volumes, the cinder-volume was down,
> > > and after the restart of the service, 199 volumes were deleted and one
> volume was manually erased.
> > >
> > > If you have a different approach to solving this problem, please let
> me know.
> > >
> > > Thanks.
> > >
> > > 2019년 2월 11일 (월) 오후 9:40, Arne Wiebalck <Arne.Wiebalck at cern.ch>님이 작성:
> > > Jae,
> > >
> > >> On 11 Feb 2019, at 11:39, Jae Sang Lee <hyangii at gmail.com> wrote:
> > >>
> > >> Arne,
> > >>
> > >> I saw the messages like ''moving volume to trash"  in the
> cinder-volume logs and the peridic task also reports
> > >> like "Deleted <vol-uuid> from trash for backend '<backends-name>'"
> > >>
> > >> The patch worked well when clearing a small number of volumes. This
> happens only when I am deleting a large
> > >> number of volumes.
> > >
> > > Hmm, from cinder’s point of view, the deletion should be more or less
> instantaneous, so it should be able to “delete”
> > > many more volumes before getting stuck.
> > >
> > > The periodic task, however, will go through the volumes one by one, so
> if you delete many at the same time,
> > > volumes may pile up in the trash (for some time) before the tasks gets
> round to delete them. This should not affect
> > > c-vol, though.
> > >
> > >> I will try to adjust the number of thread pools by adjusting the
> environment variables with your advices
> > >>
> > >> Do you know why the cinder-volume hang does not occur when create a
> volume, but only when delete a volume?
> > >
> > > Deleting a volume ties up a thread for the duration of the deletion
> (which is synchronous and can hence take very
> > > long for ). If you have too many deletions going on at the same time,
> you run out of threads and c-vol will eventually
> > > time out. FWIU, creation basically works the same way, but it is
> almost instantaneous, hence the risk of using up all
> > > threads is simply lower (Gorka may correct me here :-).
> > >
> > > Cheers,
> > >  Arne
> > >
> > >>
> > >>
> > >> Thanks.
> > >>
> > >>
> > >> 2019년 2월 11일 (월) 오후 6:14, Arne Wiebalck <Arne.Wiebalck at cern.ch>님이 작성:
> > >> Jae,
> > >>
> > >> To make sure deferred deletion is properly working: when you delete
> individual large volumes
> > >> with data in them, do you see that
> > >> - the volume is fully “deleted" within a few seconds, ie. not staying
> in ‘deleting’ for a long time?
> > >> - that the volume shows up in trash (with “rbd trash ls”)?
> > >> - the periodic task reports it is deleting volumes from the trash?
> > >>
> > >> Another option to look at is “backend_native_threads_pool_size": this
> will increase the number
> > >> of threads to work on deleting volumes. It is independent from
> deferred deletion, but can also
> > >> help with situations where Cinder has more work to do than it can
> cope with at the moment.
> > >>
> > >> Cheers,
> > >>  Arne
> > >>
> > >>
> > >>
> > >>> On 11 Feb 2019, at 09:47, Jae Sang Lee <hyangii at gmail.com> wrote:
> > >>>
> > >>> Yes, I added your code to pike release manually.
> > >>>
> > >>>
> > >>>
> > >>> 2019년 2월 11일 (월) 오후 4:39에 Arne Wiebalck <Arne.Wiebalck at cern.ch>님이
> 작성:
> > >>> Hi Jae,
> > >>>
> > >>> You back ported the deferred deletion patch to Pike?
> > >>>
> > >>> Cheers,
> > >>>  Arne
> > >>>
> > >>> > On 11 Feb 2019, at 07:54, Jae Sang Lee <hyangii at gmail.com> wrote:
> > >>> >
> > >>> > Hello,
> > >>> >
> > >>> > I recently ran a volume deletion test with deferred deletion
> enabled on the pike release.
> > >>> >
> > >>> > We experienced a cinder-volume hung when we were deleting a large
> amount of the volume in which the data was actually written(I make 15GB
> file in every volumes), and we thought deferred deletion would solve it.
> > >>> >
> > >>> > However, while deleting 200 volumes, after 50 volumes, the
> cinder-volume downed as before. In my opinion, the trash_move api does not
> seem to work properly when removing multiple volumes, just like remove api.
> > >>> >
> > >>> > If these test results are my fault, please let me know the correct
> test method.
> > >>> >
> > >>>
> > >>> --
> > >>> Arne Wiebalck
> > >>> CERN IT
> > >>>
> > >>
> > >> --
> > >> Arne Wiebalck
> > >> CERN IT
> > >>
> > >
> > > --
> > > Arne Wiebalck
> > > CERN IT
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190213/1af0bab7/attachment.html>


More information about the openstack-discuss mailing list