[openstack-dev] [Ironic] Random thoughts on asynchronous API spec

Devananda van der Veen devananda.vdv at gmail.com
Wed May 28 18:08:38 UTC 2014


While I appreciate the many ideas being discussed here (some of which we've
explored previously and agreed to continue exploring), there is a
fundamental difference vs. what I propose in that spec. I believe that what
I'm proposing will be achievable without any significant visible changes in
the API -- no new API end points or resources, and the client interaction
will be nearly the same. A few status codes may be different in certain
circumstances -- but it will not require a new major version of the REST
API. And it solves a scalability and stability problem that folks are
encountering today. (It seems my spec didn't describe those problems well
enough -- I'm updating it now.)

Cheers,
Devananda



On Wed, May 28, 2014 at 10:14 AM, Maksym Lobur <mlobur at mirantis.com> wrote:

> BTW a very similar discussion is going in Neutron community right now,
> please find a thread under the *[openstack-dev] [Neutron] Introducing
> task oriented workflows* label.
>
> Best regards,
> Max Lobur,
> Python Developer, Mirantis, Inc.
>
> Mobile: +38 (093) 665 14 28
> Skype: max_lobur
>
> 38, Lenina ave. Kharkov, Ukraine
> www.mirantis.com
> www.mirantis.ru
>
>
> On Wed, May 28, 2014 at 6:56 PM, Maksym Lobur <mlobur at mirantis.com> wrote:
>
>> Hi All,
>>
>> You've raised a good discussion, something similar already was started
>> back in february. Could someone please find the long etherpad with
>> discussion between Deva and Lifeless, as I recall most of the points
>> mentioned above have a good comments there.
>>
>> Up to this point I have the only idea how to elegantly address these
>> problems. This is a tasks concept and probably a scheduler service, which
>> not necessarily should be separate from the API at the moment (Finally we
>> already have a hash ring on the api side which is a kind of scheduler
>> right?) It was already proposed earlier, but I would like to try to fit all
>> these issues into this concept.
>>
>>
>>> 1. "Executability"
>>> We need to make sure that request can be theoretically executed,
>>> which includes:
>>> a) Validating request body
>>>
>>
>> We cannot validate everything on the API side, relying on the fact that
>> DB state is actual is not a good idea, especially under heavy load.
>>
>> In tasks concept we could assume that all the requests are executable,
>> and do not perform any validation in the API thread at all. Instead of this
>> the API will just create a task and return it's ID to the user. Task
>> scheduler may perform some minor validations before the task is queued or
>> started for convenience, but they should be duplicated inside task body
>> because there is an arbitrary time between queuing up and start
>> ((c)lifeless). I assume the scheduler will have it's own thread or even
>> process. The user will need to poke received ID to know the current state
>> of his submission.
>>
>>
>>> b) For each of entities (e.g. nodes) touched, check that they are
>>> available
>>>    at the moment (at least exist).
>>>    This is arguable, as checking for entity existence requires going to
>>> DB.
>>
>>
>> Same here, DB round trip is a potential block, therefore this will be
>> done inside task (after it's queued and started) and will not affect the
>> API. The user will just observe the task state by poking the API (or using
>> callback as an option).
>>
>> 2. Appropriate state
>>> For each entity in question, ensure that it's either in a proper state
>>> or
>>> moving to a proper state.
>>> It would help avoid users e.g. setting deploy twice on the same node
>>> It will still require some kind of NodeInAWrongStateError, but we won't
>>> necessary need a client retry on this one.
>>> Allowing the entity to be _moving_ to appropriate state gives us a
>>> problem:
>>> Imagine OP1 was running and OP2 got scheduled, hoping that OP1 will come
>>> to desired state. What if OP1 fails? What if conductor, doing OP1
>>> crashes?
>>
>>
>> Let's say OP1 and OP2 are two separate tasks. Each one have the initial
>> state validation inside it's body. Once OP2 gets its turn it will perform
>> validation and fail, which looks reasonable to me.
>>
>>
>>> Similar problem with checking node state.
>>> Imagine we schedule OP2 while we had OP1 - regular checking node state.
>>> OP1 discovers that node is actually absent and puts it to maintenance
>>> state.
>>> What to do with OP2?
>>
>>
>> The task will fail once it get it's turn.
>>
>>
>>> b) Can we make client wait for the results of periodic check?
>>>    That is, wait for OP1 _before scheduling_ OP2?
>>
>>
>> We will just schedule the task and the user will observe its progress,
>> once OP1 is finished and OP2 started - he will see a fail.
>>
>>
>>> 3. Status feedback
>>> People would like to know, how things are going with their task.
>>> What they know is that their request was scheduled. Options:
>>> a) Poll: return some REQUEST_ID and expect users to poll some endpoint.
>>>    Pros:
>>>    - Should be easy to implement
>>>    Cons:
>>>    - Requires persistent storage for tasks. Does AMQP allow to do this
>>> kinds
>>>      of queries? If not, we'll need to duplicate tasks in DB.
>>>    - Increased load on API instances and DB
>>>
>>
>> Exactly described the tasks concept :)
>>
>>
>>> b) Callback: take endpoint, call it once task is done/fails.
>>>    Pros:
>>>    - Less load on both client and server
>>>    - Answer exactly when it's ready
>>>    Cons:
>>>    - Will not work for cli and similar
>>>    - If conductor crashes, there will be no callback.
>>
>>
>> Add to Cons:
>> - Callback is not reliable since it may get lost.
>> We should have an ability to poke anyway, though I see a great benefit
>> from implementing a callbacks - to decrease API load.
>>
>>
>>> 4. Debugging consideration
>>> a) This is an open question: how to debug, if we have a lot of requests
>>>    and something went wrong?
>>>
>>
>> We will be able to see the queue state (btw what about security here,
>> should the user be able to see all the tasks, or just his ones, or all but
>> others with hidden details).
>>
>>
>>> b) One more thing to consider: how to make command like `node-show`
>>> aware of
>>>    scheduled transitioning, so that people don't try operations that are
>>>    doomed to failure.
>>
>>
>> node-show will always show current state of the node, though we may check
>> if there are any tasks queued or going, which will change the state. If any
>> - add a notification to the response.
>>
>>
>>> 5. Performance considerations
>>> a) With async approach, users will be able to schedule nearly unlimited
>>>    number of tasks, thus essentially blocking work of Ironic, without
>>> any
>>>    signs of the problem (at least for some time).
>>>    I think there are 2 common answers to this problem:
>>>    - Request throttling: disallow user to make too many requests in some
>>>      amount of time. Send them 503 with Retry-After header set.
>>
>>
>> Can this be achieved by some web-server settings? Looks like a typical
>> problem.
>>
>>
>>>    - Queue management: watch queue length, deny new requests if it's too
>>> large.
>>>
>>
>> Yes, I really like the limited queue size idea. Please see my comments in
>> the spec.
>>  Also, if we have a tasks and the queue, we could merge similar tasks
>>
>>
>>> b) State framework from (2), if invented, can become a bottleneck as
>>> well.
>>>    Especially with polling approach.
>>
>>
>> True.
>> If we have tasks, all the node actions will be done through them. We can
>> synchronise node state with DB only during the task, and remove periodic
>> syncs. Off-course someone may go and turn off the node, in this case the
>> Ironic will lie about the node state until some task is executed on this
>> node, which may be suitable behaviour. Otherwise rare periodic syncs may
>> work as well.
>>
>>
>>> 6. Usability considerations
>>> a) People will be unaware, when and whether their request is going to be
>>>    finished. As there will be tempted to retry, we may get flooded by
>>>    duplicates. I would suggest at least make it possible to request
>>> canceling
>>>    any task (which will be possible only if it is not started yet,
>>> obviously).
>>>
>>
>> Since we will have a limited number of kinds of tasks, we could calculate
>> some estimates basing on previous similar tasks. Looks like an improvement
>> for a distant future. In the end I wouldn't want Ironic to perform
>> estimates like windows's copy-paste dialog :)
>>
>> Tasks may be easily interrupted while they are in a queue. But if it's
>> already started - there's a separate dicsussion
>> https://blueprints.launchpad.net/ironic/+spec/make-tasks-interruptible(I'm going to port this bp to the specs repo in some time)
>>
>>
>>> b) We should try to avoid scheduling contradictive requests.
>>>
>>
>> A task scheduler responsibility: this is basically a state check before
>> task is scheduled, and it should be done one more time once the task is
>> started, as mentioned above.
>>
>>
>>> c) Can we somehow detect duplicated requests and ignore them?
>>>    E.g. we won't want user to make 2-3-4 reboots in a row just because
>>> the user
>>>    was not patient enough.
>>
>>
>> Queue similar tasks. All the users will be pointed to the similar task
>> resource, or maybe to a different resources which tied to the same
>> conductor action.
>>
>>
>>> Best regards,
>>> Max Lobur,
>>> Python Developer, Mirantis, Inc.
>>> Mobile: +38 (093) 665 14 28
>>> Skype: max_lobur
>>> 38, Lenina ave. Kharkov, Ukraine
>>> www.mirantis.com
>>> www.mirantis.ru
>>>
>>>
>>> On Wed, May 28, 2014 at 5:10 PM, Lucas Alvares Gomes <
>>> lucasagomes at gmail.com> wrote:
>>> On Wed, May 28, 2014 at 2:02 PM, Dmitry Tantsur <dtantsur at redhat.com>
>>> wrote:
>>> > Hi Ironic folks, hi Devananda!
>>> >
>>> > I'd like to share with you my thoughts on asynchronous API, which is
>>> > spec https://review.openstack.org/#/c/94923
>>> > First I was planned this as comments to the review, but it proved to be
>>> > much larger, so I post it for discussion on ML.
>>> >
>>> > Here is list of different consideration, I'd like to take into account
>>> > when prototyping async support, some are reflected in spec already,
>>> some
>>> > are from my and other's comments:
>>> >
>>> > 1. "Executability"
>>> > We need to make sure that request can be theoretically executed,
>>> > which includes:
>>> > a) Validating request body
>>> > b) For each of entities (e.g. nodes) touched, check that they are
>>> > available
>>> >    at the moment (at least exist).
>>> >    This is arguable, as checking for entity existence requires going to
>>> > DB.
>>>
>>> >
>>> > 2. Appropriate state
>>> > For each entity in question, ensure that it's either in a proper state
>>> > or
>>> > moving to a proper state.
>>> > It would help avoid users e.g. setting deploy twice on the same node
>>> > It will still require some kind of NodeInAWrongStateError, but we won't
>>> > necessary need a client retry on this one.
>>> >
>>> > Allowing the entity to be _moving_ to appropriate state gives us a
>>> > problem:
>>> > Imagine OP1 was running and OP2 got scheduled, hoping that OP1 will
>>> come
>>> > to desired state. What if OP1 fails? What if conductor, doing OP1
>>> > crashes?
>>> > That's why we may want to approve only operations on entities that do
>>> > not
>>> > undergo state changes. What do you think?
>>> >
>>> > Similar problem with checking node state.
>>> > Imagine we schedule OP2 while we had OP1 - regular checking node state.
>>> > OP1 discovers that node is actually absent and puts it to maintenance
>>> > state.
>>> > What to do with OP2?
>>> > a) Obvious answer is to fail it
>>> > b) Can we make client wait for the results of periodic check?
>>> >    That is, wait for OP1 _before scheduling_ OP2?
>>> >
>>> > Anyway, this point requires some state framework, that knows about
>>> > states,
>>> > transitions, actions and their compatibility with each other.
>>> For {power, provision} state changes should we queue the requests? We
>>> may want to only accept 1 request to change the state per time, if a
>>> second request comes when there's another state change mid-operation
>>> we may just return 409 (Conflict) to indicate that a state change is
>>> already in progress. This is similar of what we have today but instead
>>> of checking the node lock and states on the conductor side the API
>>> service could do it, since it's on the DB.
>>> >
>>> > 3. Status feedback
>>> > People would like to know, how things are going with their task.
>>> > What they know is that their request was scheduled. Options:
>>> > a) Poll: return some REQUEST_ID and expect users to poll some endpoint.
>>> >    Pros:
>>> >    - Should be easy to implement
>>> >    Cons:
>>> >    - Requires persistent storage for tasks. Does AMQP allow to do this
>>> > kinds
>>> >      of queries? If not, we'll need to duplicate tasks in DB.
>>> >    - Increased load on API instances and DB
>>> > b) Callback: take endpoint, call it once task is done/fails.
>>> >    Pros:
>>> >    - Less load on both client and server
>>> >    - Answer exactly when it's ready
>>> >    Cons:
>>> >    - Will not work for cli and similar
>>> >    - If conductor crashes, there will be no callback.
>>> >
>>> > Seems like we'd want both (a) and (b) to comply with current needs.
>>> +1, we could allow pooling by default (like checking
>>> nodes/<uuid>/states to know the current and target state of the node)
>>> but we may also want to include a callback parameter that users could
>>> use to input a URL that the conductor will call out as soon as the
>>> operation is finished. So if the callback URl exists, the conductor
>>> will submit a POST request to that URL with some data structure
>>> identifying the operation and the current state.
>>> >
>>> > If we have a state framework from (2), we can also add notifications to
>>> > it.
>>> >
>>> > 4. Debugging consideration
>>> > a) This is an open question: how to debug, if we have a lot of requests
>>> >    and something went wrong?
>>> > b) One more thing to consider: how to make command like `node-show`
>>> > aware of
>>> >    scheduled transitioning, so that people don't try operations that
>>> are
>>> >    doomed to failure.
>>> >
>>> > 5. Performance considerations
>>> > a) With async approach, users will be able to schedule nearly unlimited
>>> >    number of tasks, thus essentially blocking work of Ironic, without
>>> > any
>>> >    signs of the problem (at least for some time).
>>> >    I think there are 2 common answers to this problem:
>>> >    - Request throttling: disallow user to make too many requests in
>>> some
>>> >      amount of time. Send them 503 with Retry-After header set.
>>> >    - Queue management: watch queue length, deny new requests if it's
>>> too
>>> > large.
>>> >    This means actually getting back error 503 and will require retrying
>>> > again!
>>> >    At least it will be exceptional case, and won't affect Tempest
>>> run...
>>> > b) State framework from (2), if invented, can become a bottleneck as
>>> > well.
>>> >    Especially with polling approach.
>>> >
>>> > 6. Usability considerations
>>> > a) People will be unaware, when and whether their request is going to
>>> be
>>> >    finished. As there will be tempted to retry, we may get flooded by
>>> >    duplicates. I would suggest at least make it possible to request
>>> > canceling
>>> >    any task (which will be possible only if it is not started yet,
>>> > obviously).
>>> > b) We should try to avoid scheduling contradictive requests.
>>> > c) Can we somehow detect duplicated requests and ignore them?
>>> >    E.g. we won't want user to make 2-3-4 reboots in a row just because
>>> > the user
>>> >    was not patient enough.
>>> >
>>> > ------
>>> >
>>> > Possible takeaways from this letter:
>>> > - We'll need at least throttling to avoid DoS
>>> > - We'll still need handling of 503 error, though it should not happen
>>> > under
>>> >   normal conditions
>>> > - Think about state framework that unifies all this complex logic with
>>> > features:
>>> >   * Track entities, their states and actions on entities
>>> >   * Check whether new action is compatible with states of entities it
>>> > touches
>>> >     and with other ongoing and scheduled actions on these entities.
>>> >   * Handle notifications for finished and failed actions by providing
>>> > both
>>> >     pull and push approaches.
>>> >   * Track whether started action is still executed, perform error
>>> > notification,
>>> >     if not.
>>> >   * HA and high performance
>>> > - Think about policies for corner cases
>>> > - Think, how we can make a user aware of what is going on with both
>>> > request
>>> >   and entity that some requests may touch. Also consider canceling
>>> > requests.
>>> >
>>> > Please let me know, what you think.
>>> >
>>> > Dmitry.
>>> >
>>> >
>>> > _______________________________________________
>>> > OpenStack-dev mailing list
>>> > OpenStack-dev at lists.openstack.org
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>  + 1
>>>
>>
>>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140528/f7f9afea/attachment.html>


More information about the OpenStack-dev mailing list