<div dir="ltr">While I appreciate the many ideas being discussed here (some of which we've explored previously and agreed to continue exploring), there is a fundamental difference vs. what I propose in that spec. I believe that what I'm proposing will be achievable without any significant visible changes in the API -- no new API end points or resources, and the client interaction will be nearly the same. A few status codes may be different in certain circumstances -- but it will not require a new major version of the REST API. And it solves a scalability and stability problem that folks are encountering today. (It seems my spec didn't describe those problems well enough -- I'm updating it now.)<div>


<div><br></div><div>Cheers,</div><div>Devananda</div><div><br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, May 28, 2014 at 10:14 AM, Maksym Lobur <span dir="ltr"><<a href="mailto:mlobur@mirantis.com" target="_blank">mlobur@mirantis.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">BTW a very similar discussion is going in Neutron community right now, please find a thread under the <b>[openstack-dev] [Neutron] Introducing task oriented workflows</b> label.</div>


<div class="gmail_extra"><div class="">

<br clear="all"><div><div dir="ltr"><font color="#444444">Best regards,<br>Max Lobur,<br>Python Developer, Mirantis, Inc.<br></font><div><b><font color="#444444"><br></font></b></div><font color="#444444">Mobile: <a href="tel:%2B38%20%28093%29%20665%2014%2028" value="+380936651428" target="_blank">+38 (093) 665 14 28</a><br>


Skype: max_lobur<br></font><div><font color="#444444"><br></font></div><font color="#444444">38, Lenina ave. Kharkov, Ukraine</font><br><a href="http://www.mirantis.com" target="_blank">www.mirantis.com</a><br><a href="http://www.mirantis.ru" target="_blank">www.mirantis.ru</a></div>


</div>

<br><br></div><div><div class="h5"><div class="gmail_quote">On Wed, May 28, 2014 at 6:56 PM, Maksym Lobur <span dir="ltr"><<a href="mailto:mlobur@mirantis.com" target="_blank">mlobur@mirantis.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr"><div>Hi All,<div><br></div><div>You've raised a good discussion, something similar already was started back in february. Could someone please find the long etherpad with discussion between Deva and Lifeless, as I recall most of the points mentioned above have a good comments there.</div>


</div><div><br></div><div>Up to this point I have the only idea how to elegantly address these problems. This is a tasks concept and probably a scheduler service, which not necessarily should be separate from the API at the moment (Finally we already have a hash ring on the api side which is a kind of scheduler right?) It was already proposed earlier, but I would like to try to fit all these issues into this concept.</div>


<div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">1. "Executability"<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">We need to make sure that request can be theoretically executed,<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">which includes:<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">a) Validating request body<br></span></blockquote><div> </div></div><div>We cannot validate everything on the API side, relying on the fact that DB state is actual is not a good idea, especially under heavy load. </div>


<div><br></div><div>In tasks concept we could assume that all the requests are executable, and do not perform any validation in the API thread at all. Instead of this the API will just create a task and return it's ID to the user. Task scheduler may perform some minor validations before the task is queued or started for convenience, but they should be duplicated inside task body because there is an arbitrary time between queuing up and start ((c)lifeless). I assume the scheduler will have it's own thread or even process. The user will need to poke received ID to know the current state of his submission.</div>


<div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">b) For each of entities (e.g. nodes) touched, check that they are<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">available<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   at the moment (at least exist).<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   This is arguable, as checking for entity existence requires going to<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">DB.</span></blockquote><div> </div></div><div>Same here, DB round trip is a potential block, therefore this will be done inside task (after it's queued and started) and will not affect the API. The user will just observe the task state by poking the API (or using callback as an option).</div>


<div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">2. Appropriate state<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">For each entity in question, ensure that it's either in a proper state<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">or<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">moving to a proper state.<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">It would help avoid users e.g. setting deploy twice on the same node<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">It will still require some kind of NodeInAWrongStateError, but we won't<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">necessary need a client retry on this one.</span><br style="font-family:arial,sans-serif;font-size:12.727272033691406px">


<span style="font-family:arial,sans-serif;font-size:12.727272033691406px">Allowing the entity to be _moving_ to appropriate state gives us a<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">problem:<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">Imagine OP1 was running and OP2 got scheduled, hoping that OP1 will come<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">to desired state. What if OP1 fails? What if conductor, doing OP1<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">crashes?</span></blockquote><div> </div></div><div>Let's say OP1 and OP2 are two separate tasks. Each one have the initial state validation inside it's body. Once OP2 gets its turn it will perform validation and fail, which looks reasonable to me.</div>


<div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">Similar problem with checking node state.<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">Imagine we schedule OP2 while we had OP1 - regular checking node state.<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">OP1 discovers that node is actually absent and puts it to maintenance<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">state.<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">What to do with OP2?</span></blockquote><div> </div>


</div><div>The task will fail once it get it's turn.</div><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<span style="font-family:arial,sans-serif;font-size:12.727272033691406px">b) Can we make client wait for the results of periodic check?<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   That is, wait for OP1 _before scheduling_ OP2?</span></blockquote>


<div><br></div></div><div>We will just schedule the task and the user will observe its progress, once OP1 is finished and OP2 started - he will see a fail.</div><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<span style="font-family:arial,sans-serif;font-size:12.727272033691406px">3. Status feedback<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">People would like to know, how things are going with their task.<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">What they know is that their request was scheduled. Options:<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">a) Poll: return some REQUEST_ID and expect users to poll some endpoint.<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   Pros:<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - Should be easy to implement<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   Cons:<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - Requires persistent storage for tasks. Does AMQP allow to do this<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">kinds<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">     of queries? If not, we'll need to duplicate tasks in DB.<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - Increased load on API instances and DB<br>


</span></blockquote><div> </div></div><div>Exactly described the tasks concept :)</div><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<span style="font-family:arial,sans-serif;font-size:12.727272033691406px"></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">b) Callback: take endpoint, call it once task is done/fails.<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   Pros:<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - Less load on both client and server<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - Answer exactly when it's ready<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   Cons:<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - Will not work for cli and similar<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - If conductor crashes, there will be no callback.</span></blockquote>


<div><br></div></div><div>Add to Cons:</div><div>- Callback is not reliable since it may get lost. </div><div>We should have an ability to poke anyway, though I see a great benefit from implementing a callbacks - to decrease API load.</div>


<div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">4. Debugging consideration<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">a) This is an open question: how to debug, if we have a lot of requests<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   and something went wrong?<br>


</span></blockquote><div><br></div></div><div>We will be able to see the queue state (btw what about security here, should the user be able to see all the tasks, or just his ones, or all but others with hidden details).</div>


<div><div>

 </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">b) One more thing to consider: how to make command like `node-show`<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">aware of<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   scheduled transitioning, so that people don't try operations that are<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   doomed to failure.</span></blockquote><div><br></div></div><div>node-show will always show current state of the node, though we may check if there are any tasks queued or going, which will change the state. If any - add a notification to the response.</div>


<div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">5. Performance considerations<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">a) With async approach, users will be able to schedule nearly unlimited<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   number of tasks, thus essentially blocking work of Ironic, without<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">any<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   signs of the problem (at least for some time).<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   I think there are 2 common answers to this problem:<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - Request throttling: disallow user to make too many requests in some<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">     amount of time. Send them 503 with Retry-After header set.</span></blockquote><div><br></div></div><div>Can this be achieved by some web-server settings? Looks like a typical problem.</div>


<div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   - Queue management: watch queue length, deny new requests if it's too<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">large.<br></span></blockquote><div><br></div></div><div>Yes, I really like the limited queue size idea. Please see my comments in the spec.</div>


<div>

Also, if we have a tasks and the queue, we could merge similar tasks<br></div><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<span style="font-family:arial,sans-serif;font-size:12.727272033691406px">b) State framework from (2), if invented, can become a bottleneck as<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">well.<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   Especially with polling approach.</span></blockquote><div><br></div></div><div>True. </div><div>If we have tasks, all the node actions will be done through them. We can synchronise node state with DB only during the task, and remove periodic syncs. Off-course someone may go and turn off the node, in this case the Ironic will lie about the node state until some task is executed on this node, which may be suitable behaviour. Otherwise rare periodic syncs may work as well.</div>


<div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">6. Usability considerations<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">a) People will be unaware, when and whether their request is going to be<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   finished. As there will be tempted to retry, we may get flooded by<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   duplicates. I would suggest at least make it possible to request<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">canceling<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   any task (which will be possible only if it is not started yet,<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">obviously).<br>


</span></blockquote><div><br></div></div><div>Since we will have a limited number of kinds of tasks, we could calculate some estimates basing on previous similar tasks. Looks like an improvement for a distant future. In the end I wouldn't want Ironic to perform estimates like windows's copy-paste dialog :)</div>


<div><br></div><div>Tasks may be easily interrupted while they are in a queue. But if it's already started - there's a separate dicsussion <a href="https://blueprints.launchpad.net/ironic/+spec/make-tasks-interruptible" target="_blank">https://blueprints.launchpad.net/ironic/+spec/make-tasks-interruptible</a> (I'm going to port this bp to the specs repo in some time)</div>


<div>

<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">b) We should try to avoid scheduling contradictive requests.<br>


</span></blockquote><div><br></div></div><div>A task scheduler responsibility: this is basically a state check before task is scheduled, and it should be done one more time once the task is started, as mentioned above.</div>


<div><div>

 </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">c) Can we somehow detect duplicated requests and ignore them?<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   E.g. we won't want user to make 2-3-4 reboots in a row just because<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">the user<br>


</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">   was not patient enough.</span></blockquote><div><br></div></div><div>Queue similar tasks. All the users will be pointed to the similar task resource, or maybe to a different resources which tied to the same conductor action. </div>


<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font color="#444444">Best regards,<br></font><font color="#444444">Max Lobur,<br>


</font><font color="#444444">Python Developer, Mirantis, Inc.</font><b><font color="#444444"><br></font></b><font color="#444444">Mobile: <a href="tel:%2B38%20%28093%29%20665%2014%2028" value="+380936651428" target="_blank">+38 (093) 665 14 28</a><br>


</font><font color="#444444">Skype: max_lobur</font><font color="#444444"><br>

</font><font color="#444444">38, Lenina ave. Kharkov, Ukraine<br></font><a href="http://www.mirantis.com" target="_blank">www.mirantis.com<br></a><a href="http://www.mirantis.ru" target="_blank">www.mirantis.ru</a><div>

<br><br>

On Wed, May 28, 2014 at 5:10 PM, Lucas Alvares Gomes <span dir="ltr"><<a href="mailto:lucasagomes@gmail.com" target="_blank">lucasagomes@gmail.com</a>></span> wrote:<br>On Wed, May 28, 2014 at 2:02 PM, Dmitry Tantsur <<a href="mailto:dtantsur@redhat.com" target="_blank">dtantsur@redhat.com</a>> wrote:<br>


> Hi Ironic folks, hi Devananda!<br>

><br>

> I'd like to share with you my thoughts on asynchronous API, which is<br>

> spec <a href="https://review.openstack.org/#/c/94923" target="_blank">https://review.openstack.org/#/c/94923<br></a>

> First I was planned this as comments to the review, but it proved to be<br>

> much larger, so I post it for discussion on ML.<br>

><br>

> Here is list of different consideration, I'd like to take into account<br>

> when prototyping async support, some are reflected in spec already, some<br>

> are from my and other's comments:<br>

><br>

> 1. "Executability"<br>

> We need to make sure that request can be theoretically executed,<br>

> which includes:<br>

> a) Validating request body<br>

> b) For each of entities (e.g. nodes) touched, check that they are<br>

> available<br>

>    at the moment (at least exist).<br>

>    This is arguable, as checking for entity existence requires going to<br>

> DB.<br><br>

><br></div><div><div>

> 2. Appropriate state<br>

> For each entity in question, ensure that it's either in a proper state<br>

> or<br>

> moving to a proper state.<br>

> It would help avoid users e.g. setting deploy twice on the same node<br>

> It will still require some kind of NodeInAWrongStateError, but we won't<br>

> necessary need a client retry on this one.<br>

><br>

> Allowing the entity to be _moving_ to appropriate state gives us a<br>

> problem:<br>

> Imagine OP1 was running and OP2 got scheduled, hoping that OP1 will come<br>

> to desired state. What if OP1 fails? What if conductor, doing OP1<br>

> crashes?<br>

> That's why we may want to approve only operations on entities that do<br>

> not<br>

> undergo state changes. What do you think?<br>

><br>

> Similar problem with checking node state.<br>

> Imagine we schedule OP2 while we had OP1 - regular checking node state.<br>

> OP1 discovers that node is actually absent and puts it to maintenance<br>

> state.<br>

> What to do with OP2?<br>

> a) Obvious answer is to fail it<br>

> b) Can we make client wait for the results of periodic check?<br>

>    That is, wait for OP1 _before scheduling_ OP2?<br>

><br>

> Anyway, this point requires some state framework, that knows about<br>

> states,<br>

> transitions, actions and their compatibility with each other.<br>For {power, provision} state changes should we queue the requests? We<br>

may want to only accept 1 request to change the state per time, if a<br>

second request comes when there's another state change mid-operation<br>

we may just return 409 (Conflict) to indicate that a state change is<br>

already in progress. This is similar of what we have today but instead<br>

of checking the node lock and states on the conductor side the API<br>

service could do it, since it's on the DB.<br>

><br>

> 3. Status feedback<br>

> People would like to know, how things are going with their task.<br>

> What they know is that their request was scheduled. Options:<br>

> a) Poll: return some REQUEST_ID and expect users to poll some endpoint.<br>

>    Pros:<br>

>    - Should be easy to implement<br>

>    Cons:<br>

>    - Requires persistent storage for tasks. Does AMQP allow to do this<br>

> kinds<br>

>      of queries? If not, we'll need to duplicate tasks in DB.<br>

>    - Increased load on API instances and DB<br>

> b) Callback: take endpoint, call it once task is done/fails.<br>

>    Pros:<br>

>    - Less load on both client and server<br>

>    - Answer exactly when it's ready<br>

>    Cons:<br>

>    - Will not work for cli and similar<br>

>    - If conductor crashes, there will be no callback.<br>

><br>

> Seems like we'd want both (a) and (b) to comply with current needs.<br>+1, we could allow pooling by default (like checking<br>

nodes/<uuid>/states to know the current and target state of the node)<br>

but we may also want to include a callback parameter that users could<br>

use to input a URL that the conductor will call out as soon as the<br>

operation is finished. So if the callback URl exists, the conductor<br>

will submit a POST request to that URL with some data structure<br>

identifying the operation and the current state.<br>

><br>

> If we have a state framework from (2), we can also add notifications to<br>

> it.<br>

><br>

> 4. Debugging consideration<br>

> a) This is an open question: how to debug, if we have a lot of requests<br>

>    and something went wrong?<br>

> b) One more thing to consider: how to make command like `node-show`<br>

> aware of<br>

>    scheduled transitioning, so that people don't try operations that are<br>

>    doomed to failure.<br>

><br>

> 5. Performance considerations<br>

> a) With async approach, users will be able to schedule nearly unlimited<br>

>    number of tasks, thus essentially blocking work of Ironic, without<br>

> any<br>

>    signs of the problem (at least for some time).<br>

>    I think there are 2 common answers to this problem:<br>

>    - Request throttling: disallow user to make too many requests in some<br>

>      amount of time. Send them 503 with Retry-After header set.<br>

>    - Queue management: watch queue length, deny new requests if it's too<br>

> large.<br>

>    This means actually getting back error 503 and will require retrying<br>

> again!<br>

>    At least it will be exceptional case, and won't affect Tempest run...<br>

> b) State framework from (2), if invented, can become a bottleneck as<br>

> well.<br>

>    Especially with polling approach.<br>

><br>

> 6. Usability considerations<br>

> a) People will be unaware, when and whether their request is going to be<br>

>    finished. As there will be tempted to retry, we may get flooded by<br>

>    duplicates. I would suggest at least make it possible to request<br>

> canceling<br>

>    any task (which will be possible only if it is not started yet,<br>

> obviously).<br>

> b) We should try to avoid scheduling contradictive requests.<br>

> c) Can we somehow detect duplicated requests and ignore them?<br>

>    E.g. we won't want user to make 2-3-4 reboots in a row just because<br>

> the user<br>

>    was not patient enough.<br>

><br>

> ------<br>

><br>

> Possible takeaways from this letter:<br>

> - We'll need at least throttling to avoid DoS<br>

> - We'll still need handling of 503 error, though it should not happen<br>

> under<br>

>   normal conditions<br>

> - Think about state framework that unifies all this complex logic with<br>

> features:<br>

>   * Track entities, their states and actions on entities<br>

>   * Check whether new action is compatible with states of entities it<br>

> touches<br>

>     and with other ongoing and scheduled actions on these entities.<br>

>   * Handle notifications for finished and failed actions by providing<br>

> both<br>

>     pull and push approaches.<br>

>   * Track whether started action is still executed, perform error<br>

> notification,<br>

>     if not.<br>

>   * HA and high performance<br>

> - Think about policies for corner cases<br>

> - Think, how we can make a user aware of what is going on with both<br>

> request<br>

>   and entity that some requests may touch. Also consider canceling<br>

> requests.<br>

><br>

> Please let me know, what you think.<br>

><br>

> Dmitry.<br>

><br>

><br>

> _______________________________________________<br>

> OpenStack-dev mailing list<br>

> <a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

_______________________________________________<br>

OpenStack-dev mailing list<br><a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a></div>


</div></blockquote>

<div class="gmail_extra">

<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">+ 1<br>

</blockquote></div><br></div></div>

</blockquote></div><br></div></div></div>

<br>_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br></blockquote></div><br></div>