[openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation

Tomoe Sugihara tomoe at midokura.com
Wed Nov 20 02:51:40 UTC 2013


Hi Edgar,

We had a similar issue and worked around by something like the following
(which I believe similar to what Aaron said):

@@ -45,6 +45,8 @@ SNAT_RULE_PROPERTY = {OS_TENANT_ROUTER_RULE_KEY: SNAT_RULE}
 class MidonetResourceNotFound(q_exc.NotFound):
     message = _('MidoNet %(resource_type)s %(id)s could not be found')

+from eventlet.semaphore import Semaphore
+PORT_ALLOC_SEM = Semaphore()

 class MidonetPluginV2(db_base_plugin_v2.QuantumDbPluginV2,
                       l3_db.L3_NAT_db_mixin):
@@ -428,21 +430,31 @@ class MidonetPluginV2(db_base_plugin_v2.QuantumDbPluginV2,
             # set midonet port id to quantum port id and create a DB record.
             port_data['id'] = bridge_port.get_id()

-        session = context.session
-        with session.begin(subtransactions=True):
-            qport = super(MidonetPluginV2, self).create_port(context, port)
-            if is_compute_interface:
-                # get ip and mac from DB record.
-                fixed_ip = qport['fixed_ips'][0]['ip_address']
-                mac = qport['mac_address']
-
+        qport = None
+        with PORT_ALLOC_SEM:
+            session = context.session
+            with session.begin(subtransactions=True):
+                qport = super(MidonetPluginV2, self).create_port(context, port)
+                if is_compute_interface:
+                    # get ip and mac from DB record.
+                    id = qport['id']
+                    fixed_ip = qport['fixed_ips'][0]['ip_address']
+                    mac = qport['mac_address']
+
+        if qport and is_compute_interface:
+            try:
                 # create dhcp host entry under the bridge.
                 dhcp_subnets = bridge.get_dhcp_subnets()
                 if len(dhcp_subnets) > 0:
                     dhcp_subnets[0].add_dhcp_host().ip_addr(fixed_ip)\
                                                    .mac_addr(mac)\
                                                    .create()
-        return qport
+                return qport
+            except Exception:
+                self.delete_port(context, id)
+                return None
+        else:
+            return qport

     def update_port(self, context, id, port):
         """


We are also looking to fix this for upstream icehouce.

Also, I have just submitted a (regression) test for this in tempest:
https://review.openstack.org/#/c/57355

Hope the test makes sense.

Thanks,
Tomoe



On Tue, Nov 19, 2013 at 5:25 AM, Edgar Magana <emagana at plumgrid.com> wrote:

> Developers,
>
> This topic has been discussed before but I do not remember if we have a
> good solution or not.
> Basically, if concurrent API calls are sent to Neutron, all of them are
> sent to the plug-in level where two actions have to be made:
>
> 1. DB transaction – No just for data persistence but also to collect the
> information needed for the next action
> 2. Plug-in back-end implementation – In our case is a call to the python
> library than consequentially calls PLUMgrid REST GW (soon SAL)
>
> For instance:
>
> def create_port(self, context, port):
>         with context.session.begin(subtransactions=True):
>             # Plugin DB - Port Create and Return port
>             port_db = super(NeutronPluginPLUMgridV2,
> self).create_port(context,
>
>  port)
>             device_id = port_db["device_id"]
>             if port_db["device_owner"] == "network:router_gateway":
>                 router_db = self._get_router(context, device_id)
>             else:
>                 router_db = None
>             try:
>                 LOG.debug(_("PLUMgrid Library: create_port() called"))
> # Back-end implementation
>                 self._plumlib.create_port(port_db, router_db)
>             except Exception:
>>
> The way we have implemented at the plugin-level in Havana (even in
> Grizzly) is that both action are wrapped in the same "transaction" which
> automatically rolls back any operation done to its original state
> protecting mostly the DB of having any inconsistency state or left over
> data if the back-end part fails.=.
> The problem that we are experiencing is when concurrent calls to the same
> API are sent, the number of operation at the plug-in back-end are long
> enough to make the next concurrent API call to get stuck at the DB
> transaction level, which creates a hung state for the Neutron Server to the
> point that all concurrent API calls will fail.
>
> This can be fixed if we include some "locking" system such as calling:
>
> from neutron.common import utile
>>
> @utils.synchronized('any-name', external=True)
> def create_port(self, context, port):
>>
> Obviously, this will create a serialization of all concurrent calls which
> will ends up in having a really bad performance. Does anyone has a better
> solution?
>
> Thanks,
>
> Edgar
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131120/b4c70a91/attachment.html>


More information about the OpenStack-dev mailing list