[openstack-dev] [Neutron] Race condition between DB layer and plugin back-end implementation

Salvatore Orlando sorlando at nicira.com
Tue Nov 19 23:39:05 UTC 2013


Thanks for your reply Joshua,

some more comments inline; however I think I'm probably going off topic
here given the initial subject of this thread.

Talking about Taskflow, and moving aside for the plugins, which have not a
lot of community-wide interest, do you reckon Taskflow might be something
that we might look at to improve reliability and efficiency in the
nova/neutron interface? I have barely started to analyse how that could be
done, but it something which might deserve another thread of discussion.

Regards,
Salvatore


On 19 November 2013 23:56, Joshua Harlow <harlowja at yahoo-inc.com> wrote:

>  Can u explain a little how using celery achieves workflow reliability
> and avoids races (or mitigates spaghetti code)?
>
>
Celery surely does not do that. I have probably not been precise enough in
my previous post. I meant to say that celery is merely the tool we are
going to use for managing a distributed task queue.


>  To me celery acts as a way to distribute tasks, but does not deal with
> actually forming a easily understandable way of knowing that a piece of
> code that u design is actually going to go through the various state
> transitions (or states & workflows) that u expect (this is a higher level
> mechanism that u can build on-top of a distribution system). So this means
> that NVP (or neutron or other?) must be maintaining an orchestration/engine
> layer on-top of celery to add on this additional set of code that 'drives'
> celery to accomplish a given workflow in a reliable manner.
>

That's pretty much correct. The neutron plugin will define workflows and
manage state transitions.


>
>  This starts to sound pretty similar to what taskflow is doing, not being
> a direct competitor to a distributed task queue such as celery but
> providing this higher level mechanism which adds on these benefits; since
> they are needed anyway.
>

>  To me these benefits currently are (may get bigger in the future):
>
>  1. A way to define a workflow (in a way that is not tied to celery,
> since celeries '@task' decorator ties u to celeries internal
> implementation).
>      - This includes ongoing work to determine how to easily define a
> state-machine in a way that is relevant to cinder (and other projects).
> 2. A way to keep track of the state that the workflow goes through (this
> brings along resumption, progress information… when u track at the right
> level).
> 3. A way to execute that workflow reliably (potentially using celery, rpc,
> local threads, other future hotness)
>      - This becomes important when u ask yourself how did u plan on
> testing celery in the gate/jenkins/CI?
> 4. A way to guarantee that the workflow upon failure is *automatically*
> resumed by some other entity.
>

Thanks for this clarification. I will surely look at how the NVP plugin can
leverage taskflow; surely I don't want to reinvent the wheel - or, in other
words, I surely don't want to write code if that code has already been
written by somebody else for me.


>  More details @ http://www.slideshare.net/harlowja/taskflow-27820295
>
>   From: Salvatore Orlando <sorlando at nicira.com>
> Date: Tuesday, November 19, 2013 2:22 PM
>
> To: "OpenStack Development Mailing List (not for usage questions)" <
> openstack-dev at lists.openstack.org>
> Cc: Joshua Harlow <harlowja at yahoo-inc.com>, Isaku Yamahata <
> isaku.yamahata at gmail.com>, Robert Kukura <rkukura at redhat.com>
> Subject: Re: [openstack-dev] [Neutron] Race condition between DB layer
> and plugin back-end implementation
>
>   For what is worth we have considered this aspect from the perspective
> of the Neutron plugin my team maintains (NVP) during the past release
> cycle.
>
> The synchronous model that most plugins with a controller on the backend
> currently implement is simple and convenient, but has some flaws:
>
>  - reliability: the current approach where the plugin orchestrates the
> backend is not really optimal when it comes to ensuring your running
> configuration (backend/control plane) is in sync with your desired
> configuration (neutron/mgmt plane); moreover in some case, due to neutron
> internals, API calls to the backend are wrapped in a transaction too,
> leading to very long SQL transactions, which are quite dangerous indeed. It
> is not easy to recover from a failure due to an eventlet thread deadlocking
> with a mysql transaction, where by 'recover' I mean ensuring neutron and
> backend state are in sync.
>
>  - maintainability: since handling rollback in case of failures on the
> backend and/or the db is cumbersome, this often leads to spaghetti code
> which is very hard to maintain regardless of the effort (ok, I agree here
> that this also depends on how good the devs are - most of the guys in my
> team are very good, but unfortunately they have me too...).
>
>  - performance & scalability:
>     -  roundtrips to the backend take a non-negligible toll on the
> duration of an API call, whereas most Neutron API calls should probably
> just terminate at the DB just like a nova boot call does not wait for the
> VM to be ACTIVE to return.
>     - we need to keep some operation serialized in order to avoid the
> mentioned race issues
>
>  For this reason we're progressively moving toward a change in the NVP
> plugin with a series of patches under this umbrella-blueprint [1].
>
>  For answering the issues mentioned by Isaku, we've been looking at a
> task management library with an efficient and reliable set of abstractions
> for ensuring operations are properly ordered thus avoiding those races (I
> agree on the observation on the pre/post commit solution).
> We are currently looking at using celery [2] rather than taskflow; mostly
> because we've already have expertise on how to use it into our
> applications, and has very easy abstractions for workflow design, as well
> as for handling task failures.
> Said that, I think we're still open to switch to taskflow should we become
> aware of some very good reason for using it.
>
>  Regards,
> Salvatore
>
>  [1]
> https://blueprints.launchpad.net/neutron/+spec/nvp-async-backend-communication
> [2] http://docs.celeryproject.org/en/master/index.html
>
>
>
> On 19 November 2013 19:42, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
>
>> And also of course, nearly forgot a similar situation/review in heat.
>>
>> https://review.openstack.org/#/c/49440/
>>
>> Except theres was/is dealing with stack locking (a heat concept).
>>
>> On 11/19/13 10:33 AM, "Joshua Harlow" <harlowja at yahoo-inc.com> wrote:
>>
>> >If you start adding these states you might really want to consider the
>> >following work that is going on in other projects.
>> >
>> >It surely appears that everyone is starting to hit the same problem (and
>> >joining efforts would produce a more beneficial result).
>> >
>> >Relevant icehouse etherpads:
>> >- https://etherpad.openstack.org/p/CinderTaskFlowFSM
>> >- https://etherpad.openstack.org/p/icehouse-oslo-service-synchronization
>> >
>> >And of course my obvious plug for taskflow (which is designed to be a
>> >useful library to help in all these usages).
>> >
>> >- https://wiki.openstack.org/wiki/TaskFlow
>> >
>> >The states u just mentioned start to line-up with
>> >https://wiki.openstack.org/wiki/TaskFlow/States_of_Task_and_Flow
>> >
>> >If this sounds like a useful way to go (joining efforts) then lets see
>> how
>> >we can make it possible.
>> >
>> >IRC: #openstack-state-management is where I am usually at.
>> >
>> >On 11/19/13 3:57 AM, "Isaku Yamahata" <isaku.yamahata at gmail.com> wrote:
>> >
>> >>On Mon, Nov 18, 2013 at 03:55:49PM -0500,
>> >>Robert Kukura <rkukura at redhat.com> wrote:
>> >>
>> >>> On 11/18/2013 03:25 PM, Edgar Magana wrote:
>> >>> > Developers,
>> >>> >
>> >>> > This topic has been discussed before but I do not remember if we
>> have
>> >>>a
>> >>> > good solution or not.
>> >>>
>> >>> The ML2 plugin addresses this by calling each MechanismDriver twice.
>> >>>The
>> >>> create_network_precommit() method is called as part of the DB
>> >>> transaction, and the create_network_postcommit() method is called
>> after
>> >>> the transaction has been committed. Interactions with devices or
>> >>> controllers are done in the postcommit methods. If the postcommit
>> >>>method
>> >>> raises an exception, the plugin deletes that partially-created
>> resource
>> >>> and returns the exception to the client. You might consider a similar
>> >>> approach in your plugin.
>> >>
>> >>Splitting works into two phase, pre/post, is good approach.
>> >>But there still remains race window.
>> >>Once the transaction is committed, the result is visible to outside.
>> >>So the concurrent request to same resource will be racy.
>> >>There is a window after pre_xxx_yyy before post_xxx_yyy() where
>> >>other requests can be handled.
>> >>
>> >>The state machine needs to be enhanced, I think. (plugins need
>> >>modification)
>> >>For example, adding more states like pending_{create, delete, update}.
>> >>Also we would like to consider serializing between operation of ports
>> >>and subnets. or between operation of subnets and network depending on
>> >>performance requirement.
>> >>(Or carefully audit complex status change. i.e.
>> >>changing port during subnet/network update/deletion.)
>> >>
>> >>I think it would be useful to establish reference locking policy
>> >>for ML2 plugin for SDN controllers.
>> >>Thoughts or comments? If this is considered useful and acceptable,
>> >>I'm willing to help.
>> >>
>> >>thanks,
>> >>Isaku Yamahata
>> >>
>> >>> -Bob
>> >>>
>> >>> > Basically, if concurrent API calls are sent to Neutron, all of them
>> >>>are
>> >>> > sent to the plug-in level where two actions have to be made:
>> >>> >
>> >>> > 1. DB transaction ? No just for data persistence but also to collect
>> >>>the
>> >>> > information needed for the next action
>> >>> > 2. Plug-in back-end implementation ? In our case is a call to the
>> >>>python
>> >>> > library than consequentially calls PLUMgrid REST GW (soon SAL)
>> >>> >
>> >>> > For instance:
>> >>> >
>> >>> > def create_port(self, context, port):
>> >>> >         with context.session.begin(subtransactions=True):
>> >>> >             # Plugin DB - Port Create and Return port
>> >>> >             port_db = super(NeutronPluginPLUMgridV2,
>> >>> > self).create_port(context,
>> >>> >
>> >>> port)
>> >>> >             device_id = port_db["device_id"]
>> >>> >             if port_db["device_owner"] == "network:router_gateway":
>> >>> >                 router_db = self._get_router(context, device_id)
>> >>> >             else:
>> >>> >                 router_db = None
>> >>> >             try:
>> >>> >                 LOG.debug(_("PLUMgrid Library: create_port()
>> >>>called"))
>> >>> > # Back-end implementation
>> >>> >                 self._plumlib.create_port(port_db, router_db)
>> >>> >             except Exception:
>> >>> >             Š
>> >>> >
>> >>> > The way we have implemented at the plugin-level in Havana (even in
>> >>> > Grizzly) is that both action are wrapped in the same "transaction"
>> >>>which
>> >>> > automatically rolls back any operation done to its original state
>> >>> > protecting mostly the DB of having any inconsistency state or left
>> >>>over
>> >>> > data if the back-end part fails.=.
>> >>> > The problem that we are experiencing is when concurrent calls to the
>> >>> > same API are sent, the number of operation at the plug-in back-end
>> >>>are
>> >>> > long enough to make the next concurrent API call to get stuck at the
>> >>>DB
>> >>> > transaction level, which creates a hung state for the Neutron Server
>> >>>to
>> >>> > the point that all concurrent API calls will fail.
>> >>> >
>> >>> > This can be fixed if we include some "locking" system such as
>> >>>calling:
>> >>> >
>> >>> > from neutron.common import utile
>> >>> > Š
>> >>> >
>> >>> > @utils.synchronized('any-name', external=True)
>> >>> > def create_port(self, context, port):
>> >>> > Š
>> >>> >
>> >>> > Obviously, this will create a serialization of all concurrent calls
>> >>> > which will ends up in having a really bad performance. Does anyone
>> >>>has a
>> >>> > better solution?
>> >>> >
>> >>> > Thanks,
>> >>> >
>> >>> > Edgar
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > OpenStack-dev mailing list
>> >>> > OpenStack-dev at lists.openstack.org
>> >>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>> >
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> OpenStack-dev mailing list
>> >>> OpenStack-dev at lists.openstack.org
>> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >>
>> >>--
>> >>Isaku Yamahata <isaku.yamahata at gmail.com>
>> >>
>> >>_______________________________________________
>> >>OpenStack-dev mailing list
>> >>OpenStack-dev at lists.openstack.org
>> >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> >
>> >_______________________________________________
>> >OpenStack-dev mailing list
>> >OpenStack-dev at lists.openstack.org
>> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131120/937587b1/attachment.html>


More information about the OpenStack-dev mailing list