[openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation
Tomoe Sugihara
tomoe at midokura.com
Wed Nov 20 02:51:40 UTC 2013
Hi Edgar,
We had a similar issue and worked around by something like the following
(which I believe similar to what Aaron said):
@@ -45,6 +45,8 @@ SNAT_RULE_PROPERTY = {OS_TENANT_ROUTER_RULE_KEY: SNAT_RULE}
class MidonetResourceNotFound(q_exc.NotFound):
message = _('MidoNet %(resource_type)s %(id)s could not be found')
+from eventlet.semaphore import Semaphore
+PORT_ALLOC_SEM = Semaphore()
class MidonetPluginV2(db_base_plugin_v2.QuantumDbPluginV2,
l3_db.L3_NAT_db_mixin):
@@ -428,21 +430,31 @@ class MidonetPluginV2(db_base_plugin_v2.QuantumDbPluginV2,
# set midonet port id to quantum port id and create a DB record.
port_data['id'] = bridge_port.get_id()
- session = context.session
- with session.begin(subtransactions=True):
- qport = super(MidonetPluginV2, self).create_port(context, port)
- if is_compute_interface:
- # get ip and mac from DB record.
- fixed_ip = qport['fixed_ips'][0]['ip_address']
- mac = qport['mac_address']
-
+ qport = None
+ with PORT_ALLOC_SEM:
+ session = context.session
+ with session.begin(subtransactions=True):
+ qport = super(MidonetPluginV2, self).create_port(context, port)
+ if is_compute_interface:
+ # get ip and mac from DB record.
+ id = qport['id']
+ fixed_ip = qport['fixed_ips'][0]['ip_address']
+ mac = qport['mac_address']
+
+ if qport and is_compute_interface:
+ try:
# create dhcp host entry under the bridge.
dhcp_subnets = bridge.get_dhcp_subnets()
if len(dhcp_subnets) > 0:
dhcp_subnets[0].add_dhcp_host().ip_addr(fixed_ip)\
.mac_addr(mac)\
.create()
- return qport
+ return qport
+ except Exception:
+ self.delete_port(context, id)
+ return None
+ else:
+ return qport
def update_port(self, context, id, port):
"""
We are also looking to fix this for upstream icehouce.
Also, I have just submitted a (regression) test for this in tempest:
https://review.openstack.org/#/c/57355
Hope the test makes sense.
Thanks,
Tomoe
On Tue, Nov 19, 2013 at 5:25 AM, Edgar Magana <emagana at plumgrid.com> wrote:
> Developers,
>
> This topic has been discussed before but I do not remember if we have a
> good solution or not.
> Basically, if concurrent API calls are sent to Neutron, all of them are
> sent to the plug-in level where two actions have to be made:
>
> 1. DB transaction – No just for data persistence but also to collect the
> information needed for the next action
> 2. Plug-in back-end implementation – In our case is a call to the python
> library than consequentially calls PLUMgrid REST GW (soon SAL)
>
> For instance:
>
> def create_port(self, context, port):
> with context.session.begin(subtransactions=True):
> # Plugin DB - Port Create and Return port
> port_db = super(NeutronPluginPLUMgridV2,
> self).create_port(context,
>
> port)
> device_id = port_db["device_id"]
> if port_db["device_owner"] == "network:router_gateway":
> router_db = self._get_router(context, device_id)
> else:
> router_db = None
> try:
> LOG.debug(_("PLUMgrid Library: create_port() called"))
> # Back-end implementation
> self._plumlib.create_port(port_db, router_db)
> except Exception:
> …
>
> The way we have implemented at the plugin-level in Havana (even in
> Grizzly) is that both action are wrapped in the same "transaction" which
> automatically rolls back any operation done to its original state
> protecting mostly the DB of having any inconsistency state or left over
> data if the back-end part fails.=.
> The problem that we are experiencing is when concurrent calls to the same
> API are sent, the number of operation at the plug-in back-end are long
> enough to make the next concurrent API call to get stuck at the DB
> transaction level, which creates a hung state for the Neutron Server to the
> point that all concurrent API calls will fail.
>
> This can be fixed if we include some "locking" system such as calling:
>
> from neutron.common import utile
> …
>
> @utils.synchronized('any-name', external=True)
> def create_port(self, context, port):
> …
>
> Obviously, this will create a serialization of all concurrent calls which
> will ends up in having a really bad performance. Does anyone has a better
> solution?
>
> Thanks,
>
> Edgar
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131120/b4c70a91/attachment.html>
More information about the OpenStack-dev
mailing list