[openstack-dev] [Neutron] Introducing task oriented workflows

Endre Karlson endre.karlson at gmail.com
Fri May 23 13:26:17 UTC 2014


I think Cinder has some of the same sauce ?

https://review.openstack.org/#/c/94742/
https://review.openstack.org/#/c/95037/



2014-05-23 10:57 GMT+02:00 Jaume Devesa <devvesa at gmail.com>:

> ​Hello,
>
> I think the Mistral Project[1] aims the same goal, isn't it?
>
> Regards,
> jaume
>
> [1]: https://wiki.openstack.org/wiki/Mistral
>
>
> On 23 May 2014 09:28, Salvatore Orlando <sorlando at nicira.com> wrote:
>
>> Nachi,
>>
>> I will be glad if the solution was as easy as sticking a task_state
>> attribute to a resource! I'm afraid however that would be only the tip of
>> the iceberg, or the icing of the cake, if you want.
>> However, I agree with you that consistency across Openstack APIs is very
>> important; whether this is a cross project discussion is instead debatable;
>> my feeling here is that taskflow is the cross-project piece of the
>> architecture, and every project then might have a different strategy for
>> integrating it - as long as it does not result in inconsistent APIs exposed
>> to customers!
>>
>> It is something that obviously will be considered when designing how to
>> represent whether a DB resource is in sync with its actual configuration on
>> the backend.
>> I think this is something which might happen regardless of whether it
>> will be also agreed to let API consumers access task execution information
>> using the API.
>>
>> Salvatore
>>
>>
>>
>>
>> On 23 May 2014 01:16, Nachi Ueno <nachi at ntti3.com> wrote:
>>
>>> Hi Salvatore
>>>
>>> Thank you for your posting this.
>>>
>>> IMO, this topic shouldn't be limited for Neutron only.
>>> Users wants consistent API between OpenStack project, right?
>>>
>>> In Nova, a server has task_state, so Neutron should do same way.
>>>
>>>
>>>
>>> 2014-05-22 15:34 GMT-07:00 Salvatore Orlando <sorlando at nicira.com>:
>>> > As most of you probably know already, this is one of the topics
>>> discussed
>>> > during the Juno summit [1].
>>> > I would like to kick off the discussion in order to move towards a
>>> concrete
>>> > design.
>>> >
>>> > Preamble: Considering the meat that's already on the plate for Juno,
>>> I'm not
>>> > advocating that whatever comes out of this discussion should be put on
>>> the
>>> > Juno roadmap. However, preparation (or yak shaving) activities that
>>> should
>>> > be identified as pre-requisite might happen during the Juno time frame
>>> > assuming that they won't interfere with other critical or high priority
>>> > activities.
>>> > This is also a very long post; the TL;DR summary is that I would like
>>> to
>>> > explore task-oriented communication with the backend and how it should
>>> be
>>> > reflected in the API - gauging how the community feels about this, and
>>> > collecting feedback regarding design, constructs, and related
>>> > tools/techniques/technologies.
>>> >
>>> > At the summit a broad range of items were discussed during the
>>> session, and
>>> > most of them have been reported in the etherpad [1].
>>> >
>>> > First, I think it would be good to clarify whether we're advocating a
>>> > task-based API, a workflow-oriented operation processing, or both.
>>> >
>>> > --> About a task-based API
>>> >
>>> > In a task-based API, most PUT/POST API operations would return tasks
>>> rather
>>> > than neutron resources, and users of the API will interact directly
>>> with
>>> > tasks.
>>> > I put an example in [2] to avoid cluttering this post with too much
>>> text.
>>> > As the API operation simply launches a task - the database state won't
>>> be
>>> > updated until the task is completed.
>>> >
>>> > Needless to say, this would be a radical change to Neutron's API; it
>>> should
>>> > be carefully evaluated and not considered for the v2 API.
>>> > Even if it is easily recognisable that this approach has a few
>>> benefits, I
>>> > don't think this will improve usability of the API at all. Indeed this
>>> will
>>> > limit the ability of operating on a resource will a task is in
>>> execution on
>>> > it, and will also require neutron API users to change the paradigm the
>>> use
>>> > to interact with the API; for not mentioning the fact that it would
>>> look
>>> > weird if neutron is the only API endpoint in Openstack operating in
>>> this
>>> > way.
>>> > For the Neutron API, I think that its operations should still be
>>> > manipulating the database state, and possibly return immediately after
>>> that
>>> > (*) - a task, or to better say a workflow will then be started,
>>> executed
>>> > asynchronously, and update the resource status on completion.
>>> >
>>> > --> On workflow-oriented operations
>>> >
>>> > The benefits of it when it comes to easily controlling operations and
>>> > ensuring consistency in case of failures are obvious. For what is
>>> worth, I
>>> > have been experimenting introducing this kind of capability in the NSX
>>> > plugin in the past few months. I've been using celery as a task queue,
>>> and
>>> > writing the task management code from scratch - only to realize that
>>> the
>>> > same features I was implementing are already supported by taskflow.
>>> >
>>> > I think that all parts of Neutron API can greatly benefit from
>>> introducing a
>>> > flow-based approach.
>>> > Some examples:
>>> > - pre/post commit operations in the ML2 plugin can be orchestrated a
>>> lot
>>> > better as a workflow, articulating operations on the various drivers
>>> in a
>>> > graph
>>> > - operation spanning multiple plugins (eg: add router interface) could
>>> be
>>> > simplified using clearly defined tasks for the L2 and L3 parts
>>> > - it would be finally possible to properly manage resources'
>>> "operational
>>> > status", as well as knowing whether the actual configuration of the
>>> backend
>>> > matches the database configuration
>>> > - synchronous plugins might be converted into asynchronous thus
>>> improving
>>> > their API throughput
>>> >
>>> > Now, the caveats:
>>> > - during the sessions it was correctly pointed out that special care is
>>> > required with multiple producers (ie: api servers) as workflows should
>>> be
>>> > always executed in the correct order
>>> > - it is probably be advisable to serialize workflows operating on the
>>> same
>>> > resource; this might lead to unexpected situations (potentially to
>>> > deadlocks) with workflows operating on multiple resources
>>> > - if the API is asynchronous, and multiple workflows might be queued
>>> or in
>>> > execution at a given time, rolling back the DB operation on failures is
>>> > probably not advisable (it would not be advisable anyway in any
>>> asynchronous
>>> > framework). If the API instead stays synchronous the revert action for
>>> a
>>> > failed task might also restore the db state for a resource; but I
>>> think that
>>> > keeping the API synchronous missed a bit the point of this whole work
>>> - feel
>>> > free to show your disagreement here!
>>> > - some neutron workflows are actually initiated by agents; this is the
>>> case,
>>> > for instance, of the workflow for doing initial L2 and security group
>>> > configuration for a port.
>>> > - it's going to be a lot of work, and we need to devise a strategy to
>>> either
>>> > roll this changes in the existing plugins or just decide that future v3
>>> > plugins will use it.
>>> >
>>> > From the implementation side, I've done a bit of research and task
>>> queue
>>> > like celery only implement half of what is needed; conversely I have
>>> not
>>> > been able to find a workflow manager, at least in the python world, as
>>> > complete and suitable as taskflow.
>>> > So my preference will be obviously to use it, and contribute to it
>>> should we
>>> > realize Neutron needs some changes to suit its needs. Growing something
>>> > neutron-specific in tree is something I'd rule out.
>>> >
>>> > (*) This is a bit different from what many plugins do, as they execute
>>> > requests synchronously and return only once the backend request is
>>> > completed.
>>> >
>>> > --> Tasks and the API
>>> >
>>> > The etherpad [1] contains a lot of interesting notes on this topic.
>>> > One important item it to understand how tasks affect the resource's
>>> status
>>> > to indicate their completion or failure. So far Neutron resource status
>>> > pretty much expresses its "fabric" status. For instance a port is "UP"
>>> if
>>> > it's been wired by the OVS agent; it often does not tell us whether the
>>> > actual resource configuration is exactly the desired one in the
>>> database.
>>> > For instance, if the ovs agent fails to apply security groups to a
>>> port, the
>>> > port stays "ACTIVE" and the user might never know there was an error
>>> and the
>>> > actual state diverged from the desired one.
>>> >
>>> > It is therefore important to allow users to know whether the backend
>>> state
>>> > is in sync with the db; tools like taskflow will be really helpful to
>>> this
>>> > aim.
>>> > However, how should this be represented? The main options are to
>>> either have
>>> > a new attribute describing the resource sync state, or to extend the
>>> > semantics of the current status attribute to include also resource sync
>>> > state. I've put some rumblings on the subjects in the etherpad [3].
>>> > Still, it has been correctly pointed out that it might not be enough
>>> to know
>>> > that a resource is out of sync, but it is good to know which operation
>>> > exactly failed; this is where exposing somehow tasks through the API
>>> might
>>> > come handy.
>>> >
>>> > For instance one could do something like:
>>> >
>>> > GET /tasks?resource_id=<res_id>&task_state=FAILED
>>> >
>>> > to get failure details for a given resource.
>>> >
>>> > --> How to proceed
>>> >
>>> > This is where I really don't know... and I will therefore be brief.
>>> > We'll probably need some more brainstorming to flush out all the
>>> details.
>>> > Once that is done, it might the case of evaluating what needs to be
>>> done and
>>> > whether it is better to target this work onto existing plugins, or
>>> moving it
>>> > out to v3 plugins (and hence do the actual work once the "core
>>> refactoring"
>>> > activities are complete).
>>> >
>>> > Regards,
>>> > Salvatore
>>> >
>>> >
>>> > [1] https://etherpad.openstack.org/p/integrating-task-into-neutron
>>> > [2] http://paste.openstack.org/show/81184/
>>> > [3] https://etherpad.openstack.org/p/sillythings
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > OpenStack-dev mailing list
>>> > OpenStack-dev at lists.openstack.org
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Jaume Devesa
> Software Engineer at Midokura
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140523/f90fbe2c/attachment.html>


More information about the OpenStack-dev mailing list