[openstack-dev] [Ironic] Random thoughts on asynchronous API spec

Maksym Lobur mlobur at mirantis.com
Wed May 28 17:14:46 UTC 2014


BTW a very similar discussion is going in Neutron community right now,
please find a thread under the *[openstack-dev] [Neutron] Introducing task
oriented workflows* label.

Best regards,
Max Lobur,
Python Developer, Mirantis, Inc.

Mobile: +38 (093) 665 14 28
Skype: max_lobur

38, Lenina ave. Kharkov, Ukraine
www.mirantis.com
www.mirantis.ru


On Wed, May 28, 2014 at 6:56 PM, Maksym Lobur <mlobur at mirantis.com> wrote:

> Hi All,
>
> You've raised a good discussion, something similar already was started
> back in february. Could someone please find the long etherpad with
> discussion between Deva and Lifeless, as I recall most of the points
> mentioned above have a good comments there.
>
> Up to this point I have the only idea how to elegantly address these
> problems. This is a tasks concept and probably a scheduler service, which
> not necessarily should be separate from the API at the moment (Finally we
> already have a hash ring on the api side which is a kind of scheduler
> right?) It was already proposed earlier, but I would like to try to fit all
> these issues into this concept.
>
>
>> 1. "Executability"
>> We need to make sure that request can be theoretically executed,
>> which includes:
>> a) Validating request body
>>
>
> We cannot validate everything on the API side, relying on the fact that DB
> state is actual is not a good idea, especially under heavy load.
>
> In tasks concept we could assume that all the requests are executable, and
> do not perform any validation in the API thread at all. Instead of this the
> API will just create a task and return it's ID to the user. Task scheduler
> may perform some minor validations before the task is queued or started for
> convenience, but they should be duplicated inside task body because there
> is an arbitrary time between queuing up and start ((c)lifeless). I assume
> the scheduler will have it's own thread or even process. The user will need
> to poke received ID to know the current state of his submission.
>
>
>> b) For each of entities (e.g. nodes) touched, check that they are
>> available
>>    at the moment (at least exist).
>>    This is arguable, as checking for entity existence requires going to
>> DB.
>
>
> Same here, DB round trip is a potential block, therefore this will be done
> inside task (after it's queued and started) and will not affect the API.
> The user will just observe the task state by poking the API (or using
> callback as an option).
>
> 2. Appropriate state
>> For each entity in question, ensure that it's either in a proper state
>> or
>> moving to a proper state.
>> It would help avoid users e.g. setting deploy twice on the same node
>> It will still require some kind of NodeInAWrongStateError, but we won't
>> necessary need a client retry on this one.
>> Allowing the entity to be _moving_ to appropriate state gives us a
>> problem:
>> Imagine OP1 was running and OP2 got scheduled, hoping that OP1 will come
>> to desired state. What if OP1 fails? What if conductor, doing OP1
>> crashes?
>
>
> Let's say OP1 and OP2 are two separate tasks. Each one have the initial
> state validation inside it's body. Once OP2 gets its turn it will perform
> validation and fail, which looks reasonable to me.
>
>
>> Similar problem with checking node state.
>> Imagine we schedule OP2 while we had OP1 - regular checking node state.
>> OP1 discovers that node is actually absent and puts it to maintenance
>> state.
>> What to do with OP2?
>
>
> The task will fail once it get it's turn.
>
>
>> b) Can we make client wait for the results of periodic check?
>>    That is, wait for OP1 _before scheduling_ OP2?
>
>
> We will just schedule the task and the user will observe its progress,
> once OP1 is finished and OP2 started - he will see a fail.
>
>
>> 3. Status feedback
>> People would like to know, how things are going with their task.
>> What they know is that their request was scheduled. Options:
>> a) Poll: return some REQUEST_ID and expect users to poll some endpoint.
>>    Pros:
>>    - Should be easy to implement
>>    Cons:
>>    - Requires persistent storage for tasks. Does AMQP allow to do this
>> kinds
>>      of queries? If not, we'll need to duplicate tasks in DB.
>>    - Increased load on API instances and DB
>>
>
> Exactly described the tasks concept :)
>
>
>> b) Callback: take endpoint, call it once task is done/fails.
>>    Pros:
>>    - Less load on both client and server
>>    - Answer exactly when it's ready
>>    Cons:
>>    - Will not work for cli and similar
>>    - If conductor crashes, there will be no callback.
>
>
> Add to Cons:
> - Callback is not reliable since it may get lost.
> We should have an ability to poke anyway, though I see a great benefit
> from implementing a callbacks - to decrease API load.
>
>
>> 4. Debugging consideration
>> a) This is an open question: how to debug, if we have a lot of requests
>>    and something went wrong?
>>
>
> We will be able to see the queue state (btw what about security here,
> should the user be able to see all the tasks, or just his ones, or all but
> others with hidden details).
>
>
>> b) One more thing to consider: how to make command like `node-show`
>> aware of
>>    scheduled transitioning, so that people don't try operations that are
>>    doomed to failure.
>
>
> node-show will always show current state of the node, though we may check
> if there are any tasks queued or going, which will change the state. If any
> - add a notification to the response.
>
>
>> 5. Performance considerations
>> a) With async approach, users will be able to schedule nearly unlimited
>>    number of tasks, thus essentially blocking work of Ironic, without
>> any
>>    signs of the problem (at least for some time).
>>    I think there are 2 common answers to this problem:
>>    - Request throttling: disallow user to make too many requests in some
>>      amount of time. Send them 503 with Retry-After header set.
>
>
> Can this be achieved by some web-server settings? Looks like a typical
> problem.
>
>
>>    - Queue management: watch queue length, deny new requests if it's too
>> large.
>>
>
> Yes, I really like the limited queue size idea. Please see my comments in
> the spec.
> Also, if we have a tasks and the queue, we could merge similar tasks
>
>
>> b) State framework from (2), if invented, can become a bottleneck as
>> well.
>>    Especially with polling approach.
>
>
> True.
> If we have tasks, all the node actions will be done through them. We can
> synchronise node state with DB only during the task, and remove periodic
> syncs. Off-course someone may go and turn off the node, in this case the
> Ironic will lie about the node state until some task is executed on this
> node, which may be suitable behaviour. Otherwise rare periodic syncs may
> work as well.
>
>
>> 6. Usability considerations
>> a) People will be unaware, when and whether their request is going to be
>>    finished. As there will be tempted to retry, we may get flooded by
>>    duplicates. I would suggest at least make it possible to request
>> canceling
>>    any task (which will be possible only if it is not started yet,
>> obviously).
>>
>
> Since we will have a limited number of kinds of tasks, we could calculate
> some estimates basing on previous similar tasks. Looks like an improvement
> for a distant future. In the end I wouldn't want Ironic to perform
> estimates like windows's copy-paste dialog :)
>
> Tasks may be easily interrupted while they are in a queue. But if it's
> already started - there's a separate dicsussion
> https://blueprints.launchpad.net/ironic/+spec/make-tasks-interruptible(I'm going to port this bp to the specs repo in some time)
>
>
>> b) We should try to avoid scheduling contradictive requests.
>>
>
> A task scheduler responsibility: this is basically a state check before
> task is scheduled, and it should be done one more time once the task is
> started, as mentioned above.
>
>
>> c) Can we somehow detect duplicated requests and ignore them?
>>    E.g. we won't want user to make 2-3-4 reboots in a row just because
>> the user
>>    was not patient enough.
>
>
> Queue similar tasks. All the users will be pointed to the similar task
> resource, or maybe to a different resources which tied to the same
> conductor action.
>
>
>> Best regards,
>> Max Lobur,
>> Python Developer, Mirantis, Inc.
>> Mobile: +38 (093) 665 14 28
>> Skype: max_lobur
>> 38, Lenina ave. Kharkov, Ukraine
>> www.mirantis.com
>> www.mirantis.ru
>>
>>
>> On Wed, May 28, 2014 at 5:10 PM, Lucas Alvares Gomes <
>> lucasagomes at gmail.com> wrote:
>> On Wed, May 28, 2014 at 2:02 PM, Dmitry Tantsur <dtantsur at redhat.com>
>> wrote:
>> > Hi Ironic folks, hi Devananda!
>> >
>> > I'd like to share with you my thoughts on asynchronous API, which is
>> > spec https://review.openstack.org/#/c/94923
>> > First I was planned this as comments to the review, but it proved to be
>> > much larger, so I post it for discussion on ML.
>> >
>> > Here is list of different consideration, I'd like to take into account
>> > when prototyping async support, some are reflected in spec already, some
>> > are from my and other's comments:
>> >
>> > 1. "Executability"
>> > We need to make sure that request can be theoretically executed,
>> > which includes:
>> > a) Validating request body
>> > b) For each of entities (e.g. nodes) touched, check that they are
>> > available
>> >    at the moment (at least exist).
>> >    This is arguable, as checking for entity existence requires going to
>> > DB.
>>
>> >
>> > 2. Appropriate state
>> > For each entity in question, ensure that it's either in a proper state
>> > or
>> > moving to a proper state.
>> > It would help avoid users e.g. setting deploy twice on the same node
>> > It will still require some kind of NodeInAWrongStateError, but we won't
>> > necessary need a client retry on this one.
>> >
>> > Allowing the entity to be _moving_ to appropriate state gives us a
>> > problem:
>> > Imagine OP1 was running and OP2 got scheduled, hoping that OP1 will come
>> > to desired state. What if OP1 fails? What if conductor, doing OP1
>> > crashes?
>> > That's why we may want to approve only operations on entities that do
>> > not
>> > undergo state changes. What do you think?
>> >
>> > Similar problem with checking node state.
>> > Imagine we schedule OP2 while we had OP1 - regular checking node state.
>> > OP1 discovers that node is actually absent and puts it to maintenance
>> > state.
>> > What to do with OP2?
>> > a) Obvious answer is to fail it
>> > b) Can we make client wait for the results of periodic check?
>> >    That is, wait for OP1 _before scheduling_ OP2?
>> >
>> > Anyway, this point requires some state framework, that knows about
>> > states,
>> > transitions, actions and their compatibility with each other.
>> For {power, provision} state changes should we queue the requests? We
>> may want to only accept 1 request to change the state per time, if a
>> second request comes when there's another state change mid-operation
>> we may just return 409 (Conflict) to indicate that a state change is
>> already in progress. This is similar of what we have today but instead
>> of checking the node lock and states on the conductor side the API
>> service could do it, since it's on the DB.
>> >
>> > 3. Status feedback
>> > People would like to know, how things are going with their task.
>> > What they know is that their request was scheduled. Options:
>> > a) Poll: return some REQUEST_ID and expect users to poll some endpoint.
>> >    Pros:
>> >    - Should be easy to implement
>> >    Cons:
>> >    - Requires persistent storage for tasks. Does AMQP allow to do this
>> > kinds
>> >      of queries? If not, we'll need to duplicate tasks in DB.
>> >    - Increased load on API instances and DB
>> > b) Callback: take endpoint, call it once task is done/fails.
>> >    Pros:
>> >    - Less load on both client and server
>> >    - Answer exactly when it's ready
>> >    Cons:
>> >    - Will not work for cli and similar
>> >    - If conductor crashes, there will be no callback.
>> >
>> > Seems like we'd want both (a) and (b) to comply with current needs.
>> +1, we could allow pooling by default (like checking
>> nodes/<uuid>/states to know the current and target state of the node)
>> but we may also want to include a callback parameter that users could
>> use to input a URL that the conductor will call out as soon as the
>> operation is finished. So if the callback URl exists, the conductor
>> will submit a POST request to that URL with some data structure
>> identifying the operation and the current state.
>> >
>> > If we have a state framework from (2), we can also add notifications to
>> > it.
>> >
>> > 4. Debugging consideration
>> > a) This is an open question: how to debug, if we have a lot of requests
>> >    and something went wrong?
>> > b) One more thing to consider: how to make command like `node-show`
>> > aware of
>> >    scheduled transitioning, so that people don't try operations that are
>> >    doomed to failure.
>> >
>> > 5. Performance considerations
>> > a) With async approach, users will be able to schedule nearly unlimited
>> >    number of tasks, thus essentially blocking work of Ironic, without
>> > any
>> >    signs of the problem (at least for some time).
>> >    I think there are 2 common answers to this problem:
>> >    - Request throttling: disallow user to make too many requests in some
>> >      amount of time. Send them 503 with Retry-After header set.
>> >    - Queue management: watch queue length, deny new requests if it's too
>> > large.
>> >    This means actually getting back error 503 and will require retrying
>> > again!
>> >    At least it will be exceptional case, and won't affect Tempest run...
>> > b) State framework from (2), if invented, can become a bottleneck as
>> > well.
>> >    Especially with polling approach.
>> >
>> > 6. Usability considerations
>> > a) People will be unaware, when and whether their request is going to be
>> >    finished. As there will be tempted to retry, we may get flooded by
>> >    duplicates. I would suggest at least make it possible to request
>> > canceling
>> >    any task (which will be possible only if it is not started yet,
>> > obviously).
>> > b) We should try to avoid scheduling contradictive requests.
>> > c) Can we somehow detect duplicated requests and ignore them?
>> >    E.g. we won't want user to make 2-3-4 reboots in a row just because
>> > the user
>> >    was not patient enough.
>> >
>> > ------
>> >
>> > Possible takeaways from this letter:
>> > - We'll need at least throttling to avoid DoS
>> > - We'll still need handling of 503 error, though it should not happen
>> > under
>> >   normal conditions
>> > - Think about state framework that unifies all this complex logic with
>> > features:
>> >   * Track entities, their states and actions on entities
>> >   * Check whether new action is compatible with states of entities it
>> > touches
>> >     and with other ongoing and scheduled actions on these entities.
>> >   * Handle notifications for finished and failed actions by providing
>> > both
>> >     pull and push approaches.
>> >   * Track whether started action is still executed, perform error
>> > notification,
>> >     if not.
>> >   * HA and high performance
>> > - Think about policies for corner cases
>> > - Think, how we can make a user aware of what is going on with both
>> > request
>> >   and entity that some requests may touch. Also consider canceling
>> > requests.
>> >
>> > Please let me know, what you think.
>> >
>> > Dmitry.
>> >
>> >
>> > _______________________________________________
>> > OpenStack-dev mailing list
>> > OpenStack-dev at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>  + 1
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140528/534e99ab/attachment.html>


More information about the OpenStack-dev mailing list