<div dir="ltr">Nachi,<div><br></div><div>I will be glad if the solution was as easy as sticking a task_state attribute to a resource! I'm afraid however that would be only the tip of the iceberg, or the icing of the cake, if you want.</div>
<div>However, I agree with you that consistency across Openstack APIs is very important; whether this is a cross project discussion is instead debatable; my feeling here is that taskflow is the cross-project piece of the architecture, and every project then might have a different strategy for integrating it - as long as it does not result in inconsistent APIs exposed to customers!</div>
<div><br></div><div>It is something that obviously will be considered when designing how to represent whether a DB resource is in sync with its actual configuration on the backend.</div><div>I think this is something which might happen regardless of whether it will be also agreed to let API consumers access task execution information using the API.</div>
<div><br></div><div>Salvatore</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 23 May 2014 01:16, Nachi Ueno <span dir="ltr"><<a href="mailto:nachi@ntti3.com" target="_blank">nachi@ntti3.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Salvatore<br>
<br>
Thank you for your posting this.<br>
<br>
IMO, this topic shouldn't be limited for Neutron only.<br>
Users wants consistent API between OpenStack project, right?<br>
<br>
In Nova, a server has task_state, so Neutron should do same way.<br>
<br>
<br>
<br>
2014-05-22 15:34 GMT-07:00 Salvatore Orlando <<a href="mailto:sorlando@nicira.com">sorlando@nicira.com</a>>:<br>
<div><div class="h5">> As most of you probably know already, this is one of the topics discussed<br>
> during the Juno summit [1].<br>
> I would like to kick off the discussion in order to move towards a concrete<br>
> design.<br>
><br>
> Preamble: Considering the meat that's already on the plate for Juno, I'm not<br>
> advocating that whatever comes out of this discussion should be put on the<br>
> Juno roadmap. However, preparation (or yak shaving) activities that should<br>
> be identified as pre-requisite might happen during the Juno time frame<br>
> assuming that they won't interfere with other critical or high priority<br>
> activities.<br>
> This is also a very long post; the TL;DR summary is that I would like to<br>
> explore task-oriented communication with the backend and how it should be<br>
> reflected in the API - gauging how the community feels about this, and<br>
> collecting feedback regarding design, constructs, and related<br>
> tools/techniques/technologies.<br>
><br>
> At the summit a broad range of items were discussed during the session, and<br>
> most of them have been reported in the etherpad [1].<br>
><br>
> First, I think it would be good to clarify whether we're advocating a<br>
> task-based API, a workflow-oriented operation processing, or both.<br>
><br>
> --> About a task-based API<br>
><br>
> In a task-based API, most PUT/POST API operations would return tasks rather<br>
> than neutron resources, and users of the API will interact directly with<br>
> tasks.<br>
> I put an example in [2] to avoid cluttering this post with too much text.<br>
> As the API operation simply launches a task - the database state won't be<br>
> updated until the task is completed.<br>
><br>
> Needless to say, this would be a radical change to Neutron's API; it should<br>
> be carefully evaluated and not considered for the v2 API.<br>
> Even if it is easily recognisable that this approach has a few benefits, I<br>
> don't think this will improve usability of the API at all. Indeed this will<br>
> limit the ability of operating on a resource will a task is in execution on<br>
> it, and will also require neutron API users to change the paradigm the use<br>
> to interact with the API; for not mentioning the fact that it would look<br>
> weird if neutron is the only API endpoint in Openstack operating in this<br>
> way.<br>
> For the Neutron API, I think that its operations should still be<br>
> manipulating the database state, and possibly return immediately after that<br>
> (*) - a task, or to better say a workflow will then be started, executed<br>
> asynchronously, and update the resource status on completion.<br>
><br>
> --> On workflow-oriented operations<br>
><br>
> The benefits of it when it comes to easily controlling operations and<br>
> ensuring consistency in case of failures are obvious. For what is worth, I<br>
> have been experimenting introducing this kind of capability in the NSX<br>
> plugin in the past few months. I've been using celery as a task queue, and<br>
> writing the task management code from scratch - only to realize that the<br>
> same features I was implementing are already supported by taskflow.<br>
><br>
> I think that all parts of Neutron API can greatly benefit from introducing a<br>
> flow-based approach.<br>
> Some examples:<br>
> - pre/post commit operations in the ML2 plugin can be orchestrated a lot<br>
> better as a workflow, articulating operations on the various drivers in a<br>
> graph<br>
> - operation spanning multiple plugins (eg: add router interface) could be<br>
> simplified using clearly defined tasks for the L2 and L3 parts<br>
> - it would be finally possible to properly manage resources' "operational<br>
> status", as well as knowing whether the actual configuration of the backend<br>
> matches the database configuration<br>
> - synchronous plugins might be converted into asynchronous thus improving<br>
> their API throughput<br>
><br>
> Now, the caveats:<br>
> - during the sessions it was correctly pointed out that special care is<br>
> required with multiple producers (ie: api servers) as workflows should be<br>
> always executed in the correct order<br>
> - it is probably be advisable to serialize workflows operating on the same<br>
> resource; this might lead to unexpected situations (potentially to<br>
> deadlocks) with workflows operating on multiple resources<br>
> - if the API is asynchronous, and multiple workflows might be queued or in<br>
> execution at a given time, rolling back the DB operation on failures is<br>
> probably not advisable (it would not be advisable anyway in any asynchronous<br>
> framework). If the API instead stays synchronous the revert action for a<br>
> failed task might also restore the db state for a resource; but I think that<br>
> keeping the API synchronous missed a bit the point of this whole work - feel<br>
> free to show your disagreement here!<br>
> - some neutron workflows are actually initiated by agents; this is the case,<br>
> for instance, of the workflow for doing initial L2 and security group<br>
> configuration for a port.<br>
> - it's going to be a lot of work, and we need to devise a strategy to either<br>
> roll this changes in the existing plugins or just decide that future v3<br>
> plugins will use it.<br>
><br>
> From the implementation side, I've done a bit of research and task queue<br>
> like celery only implement half of what is needed; conversely I have not<br>
> been able to find a workflow manager, at least in the python world, as<br>
> complete and suitable as taskflow.<br>
> So my preference will be obviously to use it, and contribute to it should we<br>
> realize Neutron needs some changes to suit its needs. Growing something<br>
> neutron-specific in tree is something I'd rule out.<br>
><br>
> (*) This is a bit different from what many plugins do, as they execute<br>
> requests synchronously and return only once the backend request is<br>
> completed.<br>
><br>
> --> Tasks and the API<br>
><br>
> The etherpad [1] contains a lot of interesting notes on this topic.<br>
> One important item it to understand how tasks affect the resource's status<br>
> to indicate their completion or failure. So far Neutron resource status<br>
> pretty much expresses its "fabric" status. For instance a port is "UP" if<br>
> it's been wired by the OVS agent; it often does not tell us whether the<br>
> actual resource configuration is exactly the desired one in the database.<br>
> For instance, if the ovs agent fails to apply security groups to a port, the<br>
> port stays "ACTIVE" and the user might never know there was an error and the<br>
> actual state diverged from the desired one.<br>
><br>
> It is therefore important to allow users to know whether the backend state<br>
> is in sync with the db; tools like taskflow will be really helpful to this<br>
> aim.<br>
> However, how should this be represented? The main options are to either have<br>
> a new attribute describing the resource sync state, or to extend the<br>
> semantics of the current status attribute to include also resource sync<br>
> state. I've put some rumblings on the subjects in the etherpad [3].<br>
> Still, it has been correctly pointed out that it might not be enough to know<br>
> that a resource is out of sync, but it is good to know which operation<br>
> exactly failed; this is where exposing somehow tasks through the API might<br>
> come handy.<br>
><br>
> For instance one could do something like:<br>
><br>
> GET /tasks?resource_id=<res_id>&task_state=FAILED<br>
><br>
> to get failure details for a given resource.<br>
><br>
> --> How to proceed<br>
><br>
> This is where I really don't know... and I will therefore be brief.<br>
> We'll probably need some more brainstorming to flush out all the details.<br>
> Once that is done, it might the case of evaluating what needs to be done and<br>
> whether it is better to target this work onto existing plugins, or moving it<br>
> out to v3 plugins (and hence do the actual work once the "core refactoring"<br>
> activities are complete).<br>
><br>
> Regards,<br>
> Salvatore<br>
><br>
><br>
> [1] <a href="https://etherpad.openstack.org/p/integrating-task-into-neutron" target="_blank">https://etherpad.openstack.org/p/integrating-task-into-neutron</a><br>
> [2] <a href="http://paste.openstack.org/show/81184/" target="_blank">http://paste.openstack.org/show/81184/</a><br>
> [3] <a href="https://etherpad.openstack.org/p/sillythings" target="_blank">https://etherpad.openstack.org/p/sillythings</a><br>
><br>
><br>
><br>
><br>
</div></div>> _______________________________________________<br>
> OpenStack-dev mailing list<br>
> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
<br>
_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</blockquote></div><br></div>