[openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation

Edgar Magana emagana at plumgrid.com
Mon Nov 18 22:21:34 UTC 2013


Hi All,

Thank you everybody for your input. It is clear that any solution requires
changes at the plugin level (we were trying to avoid that). So, I am
wondering if a re-factor of this code is needed of not (maybe not).
The ML2 solution is probably the best alternative right now, so we may go
for it.

Any extra input is welcome!

Thanks,

Edgar

On 11/18/13 12:55 PM, "Robert Kukura" <rkukura at redhat.com> wrote:

>On 11/18/2013 03:25 PM, Edgar Magana wrote:
>> Developers,
>> 
>> This topic has been discussed before but I do not remember if we have a
>> good solution or not.
>
>The ML2 plugin addresses this by calling each MechanismDriver twice. The
>create_network_precommit() method is called as part of the DB
>transaction, and the create_network_postcommit() method is called after
>the transaction has been committed. Interactions with devices or
>controllers are done in the postcommit methods. If the postcommit method
>raises an exception, the plugin deletes that partially-created resource
>and returns the exception to the client. You might consider a similar
>approach in your plugin.
>
>-Bob
>
>> Basically, if concurrent API calls are sent to Neutron, all of them are
>> sent to the plug-in level where two actions have to be made:
>> 
>> 1. DB transaction ­ No just for data persistence but also to collect the
>> information needed for the next action
>> 2. Plug-in back-end implementation ­ In our case is a call to the python
>> library than consequentially calls PLUMgrid REST GW (soon SAL)
>> 
>> For instance:
>> 
>> def create_port(self, context, port):
>>         with context.session.begin(subtransactions=True):
>>             # Plugin DB - Port Create and Return port
>>             port_db = super(NeutronPluginPLUMgridV2,
>> self).create_port(context,
>>                 
>>port)
>>             device_id = port_db["device_id"]
>>             if port_db["device_owner"] == "network:router_gateway":
>>                 router_db = self._get_router(context, device_id)
>>             else:
>>                 router_db = None
>>             try:
>>                 LOG.debug(_("PLUMgrid Library: create_port() called"))
>> # Back-end implementation
>>                 self._plumlib.create_port(port_db, router_db)
>>             except Exception:
>>             Š
>> 
>> The way we have implemented at the plugin-level in Havana (even in
>> Grizzly) is that both action are wrapped in the same "transaction" which
>> automatically rolls back any operation done to its original state
>> protecting mostly the DB of having any inconsistency state or left over
>> data if the back-end part fails.=.
>> The problem that we are experiencing is when concurrent calls to the
>> same API are sent, the number of operation at the plug-in back-end are
>> long enough to make the next concurrent API call to get stuck at the DB
>> transaction level, which creates a hung state for the Neutron Server to
>> the point that all concurrent API calls will fail.
>> 
>> This can be fixed if we include some "locking" system such as calling:
>> 
>> from neutron.common import utile
>> Š
>> 
>> @utils.synchronized('any-name', external=True)
>> def create_port(self, context, port):
>> Š
>> 
>> Obviously, this will create a serialization of all concurrent calls
>> which will ends up in having a really bad performance. Does anyone has a
>> better solution?
>> 
>> Thanks,
>> 
>> Edgar
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





More information about the OpenStack-dev mailing list