[openstack-dev] [magnum] Magnum conductor async container operations
suro.patz at gmail.com
Thu Dec 17 07:18:12 UTC 2015
Please find the reply inline.
On 12/16/15 7:19 PM, Adrian Otto wrote:
>> On Dec 16, 2015, at 6:24 PM, Joshua Harlow <harlowja at fastmail.com> wrote:
>> SURO wrote:
>>> Hi all,
>>> Please review and provide feedback on the following design proposal for
>>> implementing the blueprint on async-container-operations -
>>> 1. Magnum-conductor would have a pool of threads for executing the
>>> container operations, viz. executor_threadpool. The size of the
>>> executor_threadpool will be configurable. [Phase0]
>>> 2. Every time, Magnum-conductor(Mcon) receives a
>>> container-operation-request from Magnum-API(Mapi), it will do the
>>> initial validation, housekeeping and then pick a thread from the
>>> executor_threadpool to execute the rest of the operations. Thus Mcon
>>> will return from the RPC request context much faster without blocking
>>> the Mapi. If the executor_threadpool is empty, Mcon will execute in a
>>> manner it does today, i.e. synchronously - this will be the
>>> rate-limiting mechanism - thus relaying the feedback of exhaustion.
>>> How often we are hitting this scenario, may be indicative to the
>>> operator to create more workers for Mcon.
>>> 3. Blocking class of operations - There will be a class of operations,
>>> which can not be made async, as they are supposed to return
>>> result/content inline, e.g. 'container-logs'. [Phase0]
>>> 4. Out-of-order considerations for NonBlocking class of operations -
>>> there is a possible race around condition for create followed by
>>> start/delete of a container, as things would happen in parallel. To
>>> solve this, we will maintain a map of a container and executing thread,
>>> for current execution. If we find a request for an operation for a
>>> container-in-execution, we will block till the thread completes the
>>> execution. [Phase0]
>> Does whatever do these operations (mcon?) run in more than one process?
> Yes, there may be multiple copies of magnum-conductor running on separate hosts.
>> Can it be requested to create in one process then delete in another? If so is that map some distributed/cross-machine/cross-process map that will be inspected to see what else is manipulating a given container (so that the thread can block until that is not the case... basically the map is acting like a operation-lock?)
Suro> @Josh, just after this, I had mentioned
"The approach above puts a prerequisite that operations for a given
container on a given Bay would go to the same Magnum-conductor instance."
Which suggested multiple instances of magnum-conductors. Also, my idea
for implementing this was as follows - magnum-conductors have an 'id'
associated, which carries the notion of [0 - (N-1)]th instance of
magnum-conductor. Given a request for a container operation, we would
always have the bay-id and container-id. I was planning to use
'hash(bay-id, key-id) modulo N' to be the logic to ensure that the right
instance picks up the intended request. Let me know if I am missing any
nuance of AMQP here.
> That’s how I interpreted it as well. This is a race prevention technique so that we don’t attempt to act on a resource until it is ready. Another way to deal with this is check the state of the resource, and return a “not ready” error if it’s not ready yet. If this happens in a part of the system that is unattended by a user, we can re-queue the call to retry after a minimum delay so that it proceeds only when the ready state is reached in the resource, or terminated after a maximum number of attempts, or if the resource enters an error state. This would allow other work to proceed while the retry waits in the queue.
Suro> @Adrian, I think async model is to let user issue a sequence of
operations, which might be causally ordered. I suggest we should honor
the causal ordering than implementing the implicit retry model. As per
my above proposal, if we can arbitrate operations for a given bay, given
container - we should be able to achieve this ordering.
>> If it's just local in one process, then I have a library for u that can solve the problem of correctly ordering parallel operations ;)
> What we are aiming for is a bit more distributed.
>>> This mechanism can be further refined to achieve more asynchronous
>>> behavior. [Phase2]
>>> The approach above puts a prerequisite that operations for a given
>>> container on a given Bay would go to the same Magnum-conductor instance.
>>> 5. The hand-off between Mcon and a thread from executor_threadpool can
>>> be reflected through new states on the 'container' object. These states
>>> can be helpful to recover/audit, in case of Mcon restart. [Phase1]
>>> Other considerations -
>>> 1. Using eventlet.greenthread instead of real threads => This approach
>>> would require further refactoring the execution code and embed yield
>>> logic, otherwise a single greenthread would block others to progress.
>>> Given, we will extend the mechanism for multiple COEs, and to keep the
>>> approach straight forward to begin with, we will use 'threading.Thread'
>>> instead of 'eventlet.greenthread'.
>>>  -
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev