[openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation

Isaku Yamahata isaku.yamahata at gmail.com
Wed Nov 20 15:25:23 UTC 2013


On Tue, Nov 19, 2013 at 11:22:46PM +0100,
Salvatore Orlando <sorlando at nicira.com> wrote:

> For what is worth we have considered this aspect from the perspective of
> the Neutron plugin my team maintains (NVP) during the past release cycle.
> 
> The synchronous model that most plugins with a controller on the backend
> currently implement is simple and convenient, but has some flaws:
> 
> - reliability: the current approach where the plugin orchestrates the
> backend is not really optimal when it comes to ensuring your running
> configuration (backend/control plane) is in sync with your desired
> configuration (neutron/mgmt plane); moreover in some case, due to neutron
> internals, API calls to the backend are wrapped in a transaction too,
> leading to very long SQL transactions, which are quite dangerous indeed. It
> is not easy to recover from a failure due to an eventlet thread deadlocking
> with a mysql transaction, where by 'recover' I mean ensuring neutron and
> backend state are in sync.
> 
> - maintainability: since handling rollback in case of failures on the
> backend and/or the db is cumbersome, this often leads to spaghetti code
> which is very hard to maintain regardless of the effort (ok, I agree here
> that this also depends on how good the devs are - most of the guys in my
> team are very good, but unfortunately they have me too...).
> 
> - performance & scalability:
>     -  roundtrips to the backend take a non-negligible toll on the duration
> of an API call, whereas most Neutron API calls should probably just
> terminate at the DB just like a nova boot call does not wait for the VM to
> be ACTIVE to return.
>     - we need to keep some operation serialized in order to avoid the
> mentioned race issues
> 
> For this reason we're progressively moving toward a change in the NVP
> plugin with a series of patches under this umbrella-blueprint [1].

Interesting. A question from curiosity. 
successful return of POST/PUT doesn't necessarily mean that
creation/update was completed.
So polling by client side is needed to wait for its completion. Right?
Or some kind of callback? Especially vif creation case would matter.


> For answering the issues mentioned by Isaku, we've been looking at a task
> management library with an efficient and reliable set of abstractions for
> ensuring operations are properly ordered thus avoiding those races (I agree
> on the observation on the pre/post commit solution).

This discussion has been started with core plugin, another resources like
service (lbaas, fw, vpn...) have similar race condition, I think.

Thanks,
Isaku Yamahata

> We are currently looking at using celery [2] rather than taskflow; mostly
> because we've already have expertise on how to use it into our
> applications, and has very easy abstractions for workflow design, as well
> as for handling task failures.
> Said that, I think we're still open to switch to taskflow should we become
> aware of some very good reason for using it.
> 
> Regards,
> Salvatore
> 
> [1]
> https://blueprints.launchpad.net/neutron/+spec/nvp-async-backend-communication
> [2] http://docs.celeryproject.org/en/master/index.html
> 
> 
> 
> On 19 November 2013 19:42, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
> 
> > And also of course, nearly forgot a similar situation/review in heat.
> >
> > https://review.openstack.org/#/c/49440/
> >
> > Except theres was/is dealing with stack locking (a heat concept).
> >
> > On 11/19/13 10:33 AM, "Joshua Harlow" <harlowja at yahoo-inc.com> wrote:
> >
> > >If you start adding these states you might really want to consider the
> > >following work that is going on in other projects.
> > >
> > >It surely appears that everyone is starting to hit the same problem (and
> > >joining efforts would produce a more beneficial result).
> > >
> > >Relevant icehouse etherpads:
> > >- https://etherpad.openstack.org/p/CinderTaskFlowFSM
> > >- https://etherpad.openstack.org/p/icehouse-oslo-service-synchronization
> > >
> > >And of course my obvious plug for taskflow (which is designed to be a
> > >useful library to help in all these usages).
> > >
> > >- https://wiki.openstack.org/wiki/TaskFlow
> > >
> > >The states u just mentioned start to line-up with
> > >https://wiki.openstack.org/wiki/TaskFlow/States_of_Task_and_Flow
> > >
> > >If this sounds like a useful way to go (joining efforts) then lets see how
> > >we can make it possible.
> > >
> > >IRC: #openstack-state-management is where I am usually at.
> > >
> > >On 11/19/13 3:57 AM, "Isaku Yamahata" <isaku.yamahata at gmail.com> wrote:
> > >
> > >>On Mon, Nov 18, 2013 at 03:55:49PM -0500,
> > >>Robert Kukura <rkukura at redhat.com> wrote:
> > >>
> > >>> On 11/18/2013 03:25 PM, Edgar Magana wrote:
> > >>> > Developers,
> > >>> >
> > >>> > This topic has been discussed before but I do not remember if we have
> > >>>a
> > >>> > good solution or not.
> > >>>
> > >>> The ML2 plugin addresses this by calling each MechanismDriver twice.
> > >>>The
> > >>> create_network_precommit() method is called as part of the DB
> > >>> transaction, and the create_network_postcommit() method is called after
> > >>> the transaction has been committed. Interactions with devices or
> > >>> controllers are done in the postcommit methods. If the postcommit
> > >>>method
> > >>> raises an exception, the plugin deletes that partially-created resource
> > >>> and returns the exception to the client. You might consider a similar
> > >>> approach in your plugin.
> > >>
> > >>Splitting works into two phase, pre/post, is good approach.
> > >>But there still remains race window.
> > >>Once the transaction is committed, the result is visible to outside.
> > >>So the concurrent request to same resource will be racy.
> > >>There is a window after pre_xxx_yyy before post_xxx_yyy() where
> > >>other requests can be handled.
> > >>
> > >>The state machine needs to be enhanced, I think. (plugins need
> > >>modification)
> > >>For example, adding more states like pending_{create, delete, update}.
> > >>Also we would like to consider serializing between operation of ports
> > >>and subnets. or between operation of subnets and network depending on
> > >>performance requirement.
> > >>(Or carefully audit complex status change. i.e.
> > >>changing port during subnet/network update/deletion.)
> > >>
> > >>I think it would be useful to establish reference locking policy
> > >>for ML2 plugin for SDN controllers.
> > >>Thoughts or comments? If this is considered useful and acceptable,
> > >>I'm willing to help.
> > >>
> > >>thanks,
> > >>Isaku Yamahata
> > >>
> > >>> -Bob
> > >>>
> > >>> > Basically, if concurrent API calls are sent to Neutron, all of them
> > >>>are
> > >>> > sent to the plug-in level where two actions have to be made:
> > >>> >
> > >>> > 1. DB transaction ? No just for data persistence but also to collect
> > >>>the
> > >>> > information needed for the next action
> > >>> > 2. Plug-in back-end implementation ? In our case is a call to the
> > >>>python
> > >>> > library than consequentially calls PLUMgrid REST GW (soon SAL)
> > >>> >
> > >>> > For instance:
> > >>> >
> > >>> > def create_port(self, context, port):
> > >>> >         with context.session.begin(subtransactions=True):
> > >>> >             # Plugin DB - Port Create and Return port
> > >>> >             port_db = super(NeutronPluginPLUMgridV2,
> > >>> > self).create_port(context,
> > >>> >
> > >>> port)
> > >>> >             device_id = port_db["device_id"]
> > >>> >             if port_db["device_owner"] == "network:router_gateway":
> > >>> >                 router_db = self._get_router(context, device_id)
> > >>> >             else:
> > >>> >                 router_db = None
> > >>> >             try:
> > >>> >                 LOG.debug(_("PLUMgrid Library: create_port()
> > >>>called"))
> > >>> > # Back-end implementation
> > >>> >                 self._plumlib.create_port(port_db, router_db)
> > >>> >             except Exception:
> > >>> >             Š
> > >>> >
> > >>> > The way we have implemented at the plugin-level in Havana (even in
> > >>> > Grizzly) is that both action are wrapped in the same "transaction"
> > >>>which
> > >>> > automatically rolls back any operation done to its original state
> > >>> > protecting mostly the DB of having any inconsistency state or left
> > >>>over
> > >>> > data if the back-end part fails.=.
> > >>> > The problem that we are experiencing is when concurrent calls to the
> > >>> > same API are sent, the number of operation at the plug-in back-end
> > >>>are
> > >>> > long enough to make the next concurrent API call to get stuck at the
> > >>>DB
> > >>> > transaction level, which creates a hung state for the Neutron Server
> > >>>to
> > >>> > the point that all concurrent API calls will fail.
> > >>> >
> > >>> > This can be fixed if we include some "locking" system such as
> > >>>calling:
> > >>> >
> > >>> > from neutron.common import utile
> > >>> > Š
> > >>> >
> > >>> > @utils.synchronized('any-name', external=True)
> > >>> > def create_port(self, context, port):
> > >>> > Š
> > >>> >
> > >>> > Obviously, this will create a serialization of all concurrent calls
> > >>> > which will ends up in having a really bad performance. Does anyone
> > >>>has a
> > >>> > better solution?
> > >>> >
> > >>> > Thanks,
> > >>> >
> > >>> > Edgar
> > >>> >
> > >>> >
> > >>> > _______________________________________________
> > >>> > OpenStack-dev mailing list
> > >>> > OpenStack-dev at lists.openstack.org
> > >>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >>> >
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> OpenStack-dev mailing list
> > >>> OpenStack-dev at lists.openstack.org
> > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >>
> > >>--
> > >>Isaku Yamahata <isaku.yamahata at gmail.com>
> > >>
> > >>_______________________________________________
> > >>OpenStack-dev mailing list
> > >>OpenStack-dev at lists.openstack.org
> > >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> > >
> > >_______________________________________________
> > >OpenStack-dev mailing list
> > >OpenStack-dev at lists.openstack.org
> > >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >

> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


-- 
Isaku Yamahata <isaku.yamahata at gmail.com>



More information about the OpenStack-dev mailing list