[openstack-dev] [magnum] Magnum conductor async container operations
SURO
suro.patz at gmail.com
Thu Dec 17 19:41:42 UTC 2015
Josh,
Thanks for bringing up this discussion. Modulo-hashing introduces a
possibility for 'window of inconsistency', and to address the dynamism
'consistent hashing' is better.
BUT, for the problem in hand I think modulo hashing is good enough, as
number of worker instances for conductor in OpenStack space is managed
through config - a change in which would require a restart of the
conductor. If the conductor is restarted, then the 'window of
inconsistency' does not occur for the situation we are discussing.
Regards,
SURO
irc//freenode: suro-patz
On 12/16/15 11:39 PM, Joshua Harlow wrote:
> SURO wrote:
>> Please find the reply inline.
>>
>> Regards,
>> SURO
>> irc//freenode: suro-patz
>>
>> On 12/16/15 7:19 PM, Adrian Otto wrote:
>>>> On Dec 16, 2015, at 6:24 PM, Joshua Harlow <harlowja at fastmail.com>
>>>> wrote:
>>>>
>>>> SURO wrote:
>>>>> Hi all,
>>>>> Please review and provide feedback on the following design
>>>>> proposal for
>>>>> implementing the blueprint[1] on async-container-operations -
>>>>>
>>>>> 1. Magnum-conductor would have a pool of threads for executing the
>>>>> container operations, viz. executor_threadpool. The size of the
>>>>> executor_threadpool will be configurable. [Phase0]
>>>>> 2. Every time, Magnum-conductor(Mcon) receives a
>>>>> container-operation-request from Magnum-API(Mapi), it will do the
>>>>> initial validation, housekeeping and then pick a thread from the
>>>>> executor_threadpool to execute the rest of the operations. Thus Mcon
>>>>> will return from the RPC request context much faster without blocking
>>>>> the Mapi. If the executor_threadpool is empty, Mcon will execute in a
>>>>> manner it does today, i.e. synchronously - this will be the
>>>>> rate-limiting mechanism - thus relaying the feedback of exhaustion.
>>>>> [Phase0]
>>>>> How often we are hitting this scenario, may be indicative to the
>>>>> operator to create more workers for Mcon.
>>>>> 3. Blocking class of operations - There will be a class of
>>>>> operations,
>>>>> which can not be made async, as they are supposed to return
>>>>> result/content inline, e.g. 'container-logs'. [Phase0]
>>>>> 4. Out-of-order considerations for NonBlocking class of operations -
>>>>> there is a possible race around condition for create followed by
>>>>> start/delete of a container, as things would happen in parallel. To
>>>>> solve this, we will maintain a map of a container and executing
>>>>> thread,
>>>>> for current execution. If we find a request for an operation for a
>>>>> container-in-execution, we will block till the thread completes the
>>>>> execution. [Phase0]
>>>> Does whatever do these operations (mcon?) run in more than one
>>>> process?
>>> Yes, there may be multiple copies of magnum-conductor running on
>>> separate hosts.
>>>
>>>> Can it be requested to create in one process then delete in another?
>>>> If so is that map some distributed/cross-machine/cross-process map
>>>> that will be inspected to see what else is manipulating a given
>>>> container (so that the thread can block until that is not the case...
>>>> basically the map is acting like a operation-lock?)
>> Suro> @Josh, just after this, I had mentioned
>>
>> "The approach above puts a prerequisite that operations for a given
>> container on a given Bay would go to the same Magnum-conductor
>> instance."
>>
>> Which suggested multiple instances of magnum-conductors. Also, my idea
>> for implementing this was as follows - magnum-conductors have an 'id'
>> associated, which carries the notion of [0 - (N-1)]th instance of
>> magnum-conductor. Given a request for a container operation, we would
>> always have the bay-id and container-id. I was planning to use
>> 'hash(bay-id, key-id) modulo N' to be the logic to ensure that the right
>> instance picks up the intended request. Let me know if I am missing any
>> nuance of AMQP here.
>
> Unsure about nuance of AMQP (I guess that's an implementation detail
> of this); but what this sounds like is similar to the hash-rings other
> projects have built (ironic uses one[1], ceilometer is slightly
> different afaik, see
> http://www.slideshare.net/EoghanGlynn/hash-based-central-agent-workload-partitioning-37760440
> and
> https://github.com/openstack/ceilometer/blob/master/ceilometer/coordination.py#L48).
>
> The typical issue with modulo hashing is changes in N (whether adding
> new conductors or deleting them) and what that change in N does to
> ongoing requests, how do u change N in an online manner (and so-on);
> typically with modulo hashing a large amount of keys get shuffled
> around[2]. So just a thought but a (consistent) hashing
> routine/ring... might be worthwhile to look into, and/or talk with
> those other projects to see what they have been up to.
>
> My 2 cents,
>
> [1]
> https://github.com/openstack/ironic/blob/master/ironic/common/hash_ring.py
>
> [2] https://en.wikipedia.org/wiki/Consistent_hashing
>
>>> That’s how I interpreted it as well. This is a race prevention
>>> technique so that we don’t attempt to act on a resource until it is
>>> ready. Another way to deal with this is check the state of the
>>> resource, and return a “not ready” error if it’s not ready yet. If
>>> this happens in a part of the system that is unattended by a user, we
>>> can re-queue the call to retry after a minimum delay so that it
>>> proceeds only when the ready state is reached in the resource, or
>>> terminated after a maximum number of attempts, or if the resource
>>> enters an error state. This would allow other work to proceed while
>>> the retry waits in the queue.
>> Suro> @Adrian, I think async model is to let user issue a sequence of
>> operations, which might be causally ordered. I suggest we should honor
>> the causal ordering than implementing the implicit retry model. As per
>> my above proposal, if we can arbitrate operations for a given bay, given
>> container - we should be able to achieve this ordering.
>>
>>
>>
>>>
>>>> If it's just local in one process, then I have a library for u that
>>>> can solve the problem of correctly ordering parallel operations ;)
>>> What we are aiming for is a bit more distributed.
>> Suro> +1
>>>
>>> Adrian
>>>
>>>>> This mechanism can be further refined to achieve more asynchronous
>>>>> behavior. [Phase2]
>>>>> The approach above puts a prerequisite that operations for a given
>>>>> container on a given Bay would go to the same Magnum-conductor
>>>>> instance.
>>>>> [Phase0]
>>>>> 5. The hand-off between Mcon and a thread from executor_threadpool
>>>>> can
>>>>> be reflected through new states on the 'container' object. These
>>>>> states
>>>>> can be helpful to recover/audit, in case of Mcon restart. [Phase1]
>>>>>
>>>>> Other considerations -
>>>>> 1. Using eventlet.greenthread instead of real threads => This
>>>>> approach
>>>>> would require further refactoring the execution code and embed yield
>>>>> logic, otherwise a single greenthread would block others to progress.
>>>>> Given, we will extend the mechanism for multiple COEs, and to keep
>>>>> the
>>>>> approach straight forward to begin with, we will use
>>>>> 'threading.Thread'
>>>>> instead of 'eventlet.greenthread'.
>>>>>
>>>>>
>>>>> Refs:-
>>>>> [1] -
>>>>> https://blueprints.launchpad.net/magnum/+spec/async-container-operations
>>>>>
>>>>>
>>>>>
>>>> __________________________________________________________________________
>>>>
>>>>
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> __________________________________________________________________________
>>>
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
>
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list