[openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation

Sukhdev Kapur sukhdevkapur at gmail.com
Tue Nov 19 07:35:16 UTC 2013


There are few examples of ML2 mechanism drivers. You can look at Arista ML2
driver. We deal with the DB synchronization as well back-end
provisioning. We deal with back-end failures and roll-backs as well
synchronization between Neutron and back-end.

best of luck
-Sukhdev



On Mon, Nov 18, 2013 at 5:27 PM, Robert Kukura <rkukura at redhat.com> wrote:

> On 11/18/2013 05:21 PM, Edgar Magana wrote:
> > Hi All,
> >
> > Thank you everybody for your input. It is clear that any solution
> requires
> > changes at the plugin level (we were trying to avoid that). So, I am
> > wondering if a re-factor of this code is needed of not (maybe not).
> > The ML2 solution is probably the best alternative right now, so we may go
> > for it.
>
> Could be a good time to consider converting the plugin to an ML2
> MechanismDriver. I'm happy to help work through the details of that if
> your are interested.
>
> -Bob
>
> >
> > Any extra input is welcome!
> >
> > Thanks,
> >
> > Edgar
> >
> > On 11/18/13 12:55 PM, "Robert Kukura" <rkukura at redhat.com> wrote:
> >
> >> On 11/18/2013 03:25 PM, Edgar Magana wrote:
> >>> Developers,
> >>>
> >>> This topic has been discussed before but I do not remember if we have a
> >>> good solution or not.
> >>
> >> The ML2 plugin addresses this by calling each MechanismDriver twice. The
> >> create_network_precommit() method is called as part of the DB
> >> transaction, and the create_network_postcommit() method is called after
> >> the transaction has been committed. Interactions with devices or
> >> controllers are done in the postcommit methods. If the postcommit method
> >> raises an exception, the plugin deletes that partially-created resource
> >> and returns the exception to the client. You might consider a similar
> >> approach in your plugin.
> >>
> >> -Bob
> >>
> >>> Basically, if concurrent API calls are sent to Neutron, all of them are
> >>> sent to the plug-in level where two actions have to be made:
> >>>
> >>> 1. DB transaction ­ No just for data persistence but also to collect
> the
> >>> information needed for the next action
> >>> 2. Plug-in back-end implementation ­ In our case is a call to the
> python
> >>> library than consequentially calls PLUMgrid REST GW (soon SAL)
> >>>
> >>> For instance:
> >>>
> >>> def create_port(self, context, port):
> >>>         with context.session.begin(subtransactions=True):
> >>>             # Plugin DB - Port Create and Return port
> >>>             port_db = super(NeutronPluginPLUMgridV2,
> >>> self).create_port(context,
> >>>
> >>> port)
> >>>             device_id = port_db["device_id"]
> >>>             if port_db["device_owner"] == "network:router_gateway":
> >>>                 router_db = self._get_router(context, device_id)
> >>>             else:
> >>>                 router_db = None
> >>>             try:
> >>>                 LOG.debug(_("PLUMgrid Library: create_port() called"))
> >>> # Back-end implementation
> >>>                 self._plumlib.create_port(port_db, router_db)
> >>>             except Exception:
> >>>             Š
> >>>
> >>> The way we have implemented at the plugin-level in Havana (even in
> >>> Grizzly) is that both action are wrapped in the same "transaction"
> which
> >>> automatically rolls back any operation done to its original state
> >>> protecting mostly the DB of having any inconsistency state or left over
> >>> data if the back-end part fails.=.
> >>> The problem that we are experiencing is when concurrent calls to the
> >>> same API are sent, the number of operation at the plug-in back-end are
> >>> long enough to make the next concurrent API call to get stuck at the DB
> >>> transaction level, which creates a hung state for the Neutron Server to
> >>> the point that all concurrent API calls will fail.
> >>>
> >>> This can be fixed if we include some "locking" system such as calling:
> >>>
> >>> from neutron.common import utile
> >>> Š
> >>>
> >>> @utils.synchronized('any-name', external=True)
> >>> def create_port(self, context, port):
> >>> Š
> >>>
> >>> Obviously, this will create a serialization of all concurrent calls
> >>> which will ends up in having a really bad performance. Does anyone has
> a
> >>> better solution?
> >>>
> >>> Thanks,
> >>>
> >>> Edgar
> >>>
> >>>
> >>> _______________________________________________
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev at lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>
> >>
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131118/417d6ba2/attachment.html>


More information about the OpenStack-dev mailing list