[openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation

Edgar Magana emagana at plumgrid.com
Tue Nov 19 16:59:38 UTC 2013


Isaku,

Do you have in mind any implementation, any BP?
We could actually work on this together, all plugins will get the benefits
of a better implementation.

Thanks,

Edgar

On 11/19/13 3:57 AM, "Isaku Yamahata" <isaku.yamahata at gmail.com> wrote:

>On Mon, Nov 18, 2013 at 03:55:49PM -0500,
>Robert Kukura <rkukura at redhat.com> wrote:
>
>> On 11/18/2013 03:25 PM, Edgar Magana wrote:
>> > Developers,
>> > 
>> > This topic has been discussed before but I do not remember if we have
>>a
>> > good solution or not.
>> 
>> The ML2 plugin addresses this by calling each MechanismDriver twice. The
>> create_network_precommit() method is called as part of the DB
>> transaction, and the create_network_postcommit() method is called after
>> the transaction has been committed. Interactions with devices or
>> controllers are done in the postcommit methods. If the postcommit method
>> raises an exception, the plugin deletes that partially-created resource
>> and returns the exception to the client. You might consider a similar
>> approach in your plugin.
>
>Splitting works into two phase, pre/post, is good approach.
>But there still remains race window.
>Once the transaction is committed, the result is visible to outside.
>So the concurrent request to same resource will be racy.
>There is a window after pre_xxx_yyy before post_xxx_yyy() where
>other requests can be handled.
>
>The state machine needs to be enhanced, I think. (plugins need
>modification)
>For example, adding more states like pending_{create, delete, update}.
>Also we would like to consider serializing between operation of ports
>and subnets. or between operation of subnets and network depending on
>performance requirement.
>(Or carefully audit complex status change. i.e.
>changing port during subnet/network update/deletion.)
>
>I think it would be useful to establish reference locking policy
>for ML2 plugin for SDN controllers.
>Thoughts or comments? If this is considered useful and acceptable,
>I'm willing to help.
>
>thanks,
>Isaku Yamahata
>
>> -Bob
>> 
>> > Basically, if concurrent API calls are sent to Neutron, all of them
>>are
>> > sent to the plug-in level where two actions have to be made:
>> > 
>> > 1. DB transaction ? No just for data persistence but also to collect
>>the
>> > information needed for the next action
>> > 2. Plug-in back-end implementation ? In our case is a call to the
>>python
>> > library than consequentially calls PLUMgrid REST GW (soon SAL)
>> > 
>> > For instance:
>> > 
>> > def create_port(self, context, port):
>> >         with context.session.begin(subtransactions=True):
>> >             # Plugin DB - Port Create and Return port
>> >             port_db = super(NeutronPluginPLUMgridV2,
>> > self).create_port(context,
>> >               
>> port)
>> >             device_id = port_db["device_id"]
>> >             if port_db["device_owner"] == "network:router_gateway":
>> >                 router_db = self._get_router(context, device_id)
>> >             else:
>> >                 router_db = None
>> >             try:
>> >                 LOG.debug(_("PLUMgrid Library: create_port() called"))
>> > # Back-end implementation
>> >                 self._plumlib.create_port(port_db, router_db)
>> >             except Exception:
>> >             Š
>> > 
>> > The way we have implemented at the plugin-level in Havana (even in
>> > Grizzly) is that both action are wrapped in the same "transaction"
>>which
>> > automatically rolls back any operation done to its original state
>> > protecting mostly the DB of having any inconsistency state or left
>>over
>> > data if the back-end part fails.=.
>> > The problem that we are experiencing is when concurrent calls to the
>> > same API are sent, the number of operation at the plug-in back-end are
>> > long enough to make the next concurrent API call to get stuck at the
>>DB
>> > transaction level, which creates a hung state for the Neutron Server
>>to
>> > the point that all concurrent API calls will fail.
>> > 
>> > This can be fixed if we include some "locking" system such as calling:
>> > 
>> > from neutron.common import utile
>> > Š
>> > 
>> > @utils.synchronized('any-name', external=True)
>> > def create_port(self, context, port):
>> > Š
>> > 
>> > Obviously, this will create a serialization of all concurrent calls
>> > which will ends up in having a really bad performance. Does anyone
>>has a
>> > better solution?
>> > 
>> > Thanks,
>> > 
>> > Edgar
>> > 
>> > 
>> > _______________________________________________
>> > OpenStack-dev mailing list
>> > OpenStack-dev at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> > 
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>-- 
>Isaku Yamahata <isaku.yamahata at gmail.com>





More information about the OpenStack-dev mailing list