[openstack-dev] [Neutron] DB: transaction isolation and related questions

Ian Wells ijw.ubuntu at cack.org.uk
Wed Nov 19 20:33:40 UTC 2014

On 19 November 2014 11:58, Jay Pipes <jaypipes at gmail.com> wrote:

> Some code paths that used locking in the past were rewritten to retry
>> the operation if they detect that an object was modified concurrently.
>> The problem here is that all DB operations (CRUD) are performed in the
>> scope of some transaction that makes complex operations to be executed
>> in atomic manner.
> Yes. The root of the problem in Neutron is that the session object is
> passed through all of the various plugin methods and the
> session.begin(subtransactions=True) is used all over the place, when in
> reality many things should not need to be done in long-lived transactional
> containers.

I think the issue is one of design, and it's possible what we discussed at
the summit may address some of this.

At the moment, Neutron's a bit confused about what it is.  Some plugins
treat a call to Neutron as the period of time in which an action should be
completed - the 'atomicity' thing.  This is not really compatible with a
distributed system and it's certainly not compatible with the principle of
eventual consistency that Openstack is supposed to follow.  Some plugins
treat the call as a change to desired networking state, and the action on
the network is performed asynchronously to bring the network state into
alignment with the state of the database.  (Many plugins do a bit of both.)

When you have a plugin that's decided to be synchronous, then there are
cases where the DB lock is held for a technically indefinite period of
time.  This is basically broken.

What we said at the summit is that we should move to an entirely async
model for the API, which in turn gets us to the 'desired state' model for
the DB.  DB writes would take one of two forms:

- An API call has requested that the data be updated, which it can do
immediately - the DB transaction takes as long as it takes to write the DB
consistently, and can hold locks on referenced rows to main consistency
providing the whole operation remains brief
- A network change has completed and the plugin wants to update an object's
state - again, the DB transaction contains only DB ops and nothing else and
should be quick.

Now, if we moved to that model, DB locks would be very very brief for the
sort of queries we'd need to do.  Setting aside the joys of Galera (and I
believe we identified that using one Galera node and doing all writes
through it worked just fine, though we could probably distribute read-only
transactions across all of them in the future), would there be any need for
transaction retries in that scenario?  I would have thought that DB locking
would be just fine as long as there was nothing but DB operations for the
period a transaction was open, and thus significantly changing the DB
lock/retry model now is a waste of time because it's a problem that will go

Does that theory hold water?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141119/15357580/attachment.html>

More information about the OpenStack-dev mailing list