<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 19 November 2014 11:58, Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Some code paths that used locking in the past were rewritten to retry<br><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
the operation if they detect that an object was modified concurrently.<br>
The problem here is that all DB operations (CRUD) are performed in the<br>
scope of some transaction that makes complex operations to be executed<br>
in atomic manner.<br>
</blockquote>
<br></span>
Yes. The root of the problem in Neutron is that the session object is passed through all of the various plugin methods and the session.begin(subtransactions=<u></u>True) is used all over the place, when in reality many things should not need to be done in long-lived transactional containers.<span class=""><br></span></blockquote><div><br></div><div>I think the issue is one of design, and it's possible what we discussed at the summit may address some of this.<br></div><div><br></div><div>At the moment, Neutron's a bit confused about what it is. Some plugins treat a call to Neutron as the period of time in which an action should be completed - the 'atomicity' thing. This is not really compatible with a distributed system and it's certainly not compatible with the principle of eventual consistency that Openstack is supposed to follow. Some plugins treat the call as a change to desired networking state, and the action on the network is performed asynchronously to bring the network state into alignment with the state of the database. (Many plugins do a bit of both.)<br><br></div><div>When you have a plugin that's decided to be synchronous, then there are cases where the DB lock is held for a technically indefinite period of time. This is basically broken.<br><br>What we said at the summit is that we should move to an entirely async model for the API, which in turn gets us to the 'desired state' model for the DB. DB writes would take one of two forms:<br><br></div><div>- An API call has requested that the data be updated, which it can do immediately - the DB transaction takes as long as it takes to write the DB consistently, and can hold locks on referenced rows to main consistency providing the whole operation remains brief<br></div><div>- A network change has completed and the plugin wants to update an object's state - again, the DB transaction contains only DB ops and nothing else and should be quick.<br><br></div><div>Now, if we moved to that model, DB locks would be very very brief for the sort of queries we'd need to do. Setting aside the joys of Galera (and I believe we identified that using one Galera node and doing all writes through it worked just fine, though we could probably distribute read-only transactions across all of them in the future), would there be any need for transaction retries in that scenario? I would have thought that DB locking would be just fine as long as there was nothing but DB operations for the period a transaction was open, and thus significantly changing the DB lock/retry model now is a waste of time because it's a problem that will go away.<br><br></div><div>Does that theory hold water?<br><br>-- <br></div><div>Ian.<br></div></div></div></div>