[openstack-dev] [neutron][oslo.db] db retry madness

Michel Peterson mpeterso at redhat.com
Sun Sep 10 13:11:11 UTC 2017

Thanks for this most interesting and informative exchange.

On Fri, Sep 8, 2017 at 5:20 PM, Michael Bayer <mbayer at redhat.com> wrote:
> On Fri, Sep 8, 2017 at 8:53 AM, Kevin Benton <kevin at benton.pub> wrote:
> > Since the goal of that patch is to deal with deadlocks, the retry can't
> > happen down at that level, it will need to be somewhere further up the call
> > stack since the deadlock will put the session in a rollback state.
> OK when I was talking to Michel about this, I first mentioned that to
> get retry within this in-a-transaction block would require adding a
> savepoint to the mix, but he seemed to indicate that the method can
> also run by itself, but since there's no session.begin() there then
> that matches with my initial impression that the retry has to be at
> the level which the transaction starts, and that's not here.

I think there is also a deal of madness on how things are implemented
in this particular project. I'll review the way it's handled and
adjust accordingly, and then implement the right retries in the
correct places. I think my strategy will be to first start with using
a decorator based on wrap_db_retry to fix the bug. After that I'll
start, in another set of patches, the migration from the "legacy"
pattern to the "new" enginefacade. What do you think? Or given that
enginefacade already diminishes the possibilities of deadlocks I
should already start with the migration to enginefacade and then with
that already in place

> > For the functions being called outside of an active session, they should
> > switch to accepting a context to handle the enginefacade switch.


> >>  3a.  Is this a temporary measure to work around legacy patterns?
> >
> > Yes. When that went in we had very little adoption of the enginefacade. Even
> > now we haven't fully completed the transition so checking for is_active
> > covers the old code paths.
> OK, so this is where they are still calling session.begin()
> themselves.    Are all these session.begin() calls in projects under
> the "openstack" umbrella so I can use codesearch ?

So, to make sure I understand correctly. With enginefacade instead of
calling session.begin() to handle the transactions, when using
'reader' o 'writer' we are already inside a transaction, right? If
that's correct, no matter if decorating or using a context manager, in
both cases the transaction lives for the context of the decorated
function or the with-stmt context, right?

> > As we continue migrating to the enginefacade, the cases where you can get a
> > reference to a session that actually has 'is_active==False' are going to
> > keep dwindling. Once we complete the switch and remove the legacy session
> > constructor, the decorator will just be adjusted to check if there is a
> > session attached to the context to determine if retries are safe since
> > is_active will always be True.

Therefore, on implementation of enginefacade retry_if_session_inactive
is rendered useless and because of the deepcopy retry_db_errors in our
usecase is already useless. As a result we are better off with using
wrap_db_retry to create a decorator parametrised according to our
needs (ie: max retries, exception checker, etc), right? And for
exception checker I should use the is_retriable from neutron-lib
instead of the neutron db api, right? (it's a shame that MAX_RETRIES
is not also on neutron-lib and only in neutron)

Thanks to you both.

More information about the OpenStack-dev mailing list