[openstack-dev] Nova workflow management update
John Garbutt
john at johngarbutt.com
Thu May 2 11:41:40 UTC 2013
That is the big problem. I think we agree a single conductor is a bad
idea, but is the simplest fix.
I was hoping to use the DB and keep it simple (ish).
Start with copy and swap on the DB:
- in a db transaction, check the existing server state, and atomically
move to the new one
- we kinda do the above already for some cases
- API rejects requests in the wrong state (assuming we only support
one task at once for a server, for now)
- "edge case" of they get to past API, but before executed someone
else beat them, just record and instance action saying that API
request failed
That should guard the starting of the task, then as you say, it could
die, how do we restart?
Maybe we have a "tasks" db for the conductor, so:
- in the same transaction as the above call the conductor would...
- assign the task to its self (by conductor queue name, i.e. host name)
- the task can have checkpoints to help restart half way through
So if it restarts, it can restart all the operations it had not yet
completed using the info in the db. Not possible yet, but this was one
of the main things we wanted to do anyway. If it fails on task resume,
then it is responsible for recording that failure.
As you say, there is liveness issue. If we loose a conductor, when can
we choose to move tasks to a new conductor? The first idea I have is
to assume a conductor, if it dies, is brought back to life by the
administrator. Monitor it all in the same way nova-compute is
monitored. Maybe have an admin operation that is "dangerous" but lets
you disable a conductor and move all the tasks to a new conductor?
Just in case the admin is unable to resume the old conductor (or a new
conductor with the same name).
There is the case where two conductors have the same name, and share
the same DB queue. Or the case where a conductor forgets a task half
way through and no progress is made. I am thinking we should leave
this to the administrator to monitor, its crazy complex to fix.
I think I am making reasonable requests of the cloud admins.
There must be a flaw in this... just can't see it yet.
Ideas?
John
On 1 May 2013 19:43, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
> I've started
> https://wiki.openstack.org/wiki/TheBetterPathToLiveMigrationResizing and
> will try to continue there.
>
> The other aspect that makes me wonder is after we have conductor doing
> stuff is how do we ensure that locking of what it is doing is done
> correctly.
>
> Say u have the following:
>
> API call #1 -> resize instance X (lets call this action A)
> API call #2 -> resize instance X (lets call this action B)
>
>
> Now both of those happen in the same millisecond, so what happens (thought
> game time!).
>
> It would seem they attempt to mark something in the DB saying 'working on
> X' by altering instance X's 'task/vm_state'. Ok so u can put a transaction
> around said write to the 'task/vm_state' of instance X to avoid both of
> those api calls attempting to continue doing the work. So that’s good. So
> then lets say api #1 sends a message to some conductor Z asking it to do
> the work via the MQ, that’s great, then the conductor Z starts doing work
> on instance X and such.
>
> So now the big iffy question that I have is what happens if conductor Z is
> 'killed' (say via error, exception, power failure, kill -9). What happens
> to action A? How can another conductor be assigned the work to do action
> A? Will there be a new periodic task to scan the DB for 'dead' actions,
> how do we determine if an action is dead or just taking a very long time?
> This 'liveness' issue is a big one that I think needs to be considered and
> if conductor and zookeeper get connected, then I think it can be done.
>
> Then the other big iffy stuff is how do we stop a third API call from
> invoking a third action on a resource associated with instance X (say a
> deletion of a volume) while the first api action is still being conducted,
> just associating a instance level lock via 'task/vm_state' is not the
> correct way to way to lock resources associated with instance X. This is
> where zookeeper can come into play again (since its core design was built
> for distributed locking) and it can be used to not only lock the instance
> X 'task/vm_state' but all other resources associated with instance X (in a
> reliable manner).
>
> Thoughts?
>
> On 5/1/13 10:56 AM, "John Garbutt" <john at johngarbutt.com> wrote:
>
>>Hey,
>>
>>I think some lightweight sequence diagrams could make sense.
>>
>>On 29 April 2013 21:55, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
>>> Any thoughts on how the current conductor db-activity works with this?
>>> I can see two entry points to conductor:
>>> DB data calls
>>> |
>>> ------------------------------------------Conductor-->RPC/DB calls to
>>>do
>>> this stuff
>>> |
>>> Workflow on behalf of something calls |
>>> | |
>>> ---------------------------------------------|
>>>
>>> Maybe its not a concern for 'H' but it seems one of those doesn¹t belong
>>> there (cough cough DB stuff).
>>
>>Maybe for the next release. It should become obvious I guess. I hope
>>those db calls will disappear once we pull the workflows properly into
>>conductor and the other servers become more stateless (in terms of
>>nova db state).
>>
>>Key question: Should the conductor be allowed to make DB calls? I think
>>yes?
>>
>>> My writeup @ https://wiki.openstack.org/wiki/StructuredStateManagement
>>>is
>>> a big part of the overall goal I think, where I think the small
>>>iterations
>>> are part of this goal, yet likely both small and big goals will be
>>> happening at once, so it would be useful to ensure that we talk about
>>>the
>>> bigger goal and make sure the smaller iteration goal will eventually
>>> arrive at the bigger goal (or can be adjusted to be that way). Since
>>>some
>>> rackspace folks will also be helping out building the underlying
>>> foundation (convection library) for the end-goal it would be great to
>>>work
>>> together and make sure all small iterations also align with that
>>> foundational library work.
>>
>>Take a look at spawn in XenAPI, it is heading down this direction:
>>https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L3
>>35
>>
>>I think we should just make a very small bit of the operation do
>>rollback and state management, which is more just an exercise, and
>>then start to pull more of the code into line as time progresses.
>>Probably best done on something that has already been pulled into a
>>conductor style job?
>>
>>> I'd be interested in what u think about moving the scheduler code
>>>around,
>>> since this also connects into some work the cisco folks want to do for
>>> better scheduling, so that is yet another coordination of work that
>>>needs
>>> to happen (to end up at the same end-goal there as well).
>>
>>Yes, I think its very related. I see this kind of thing:
>>
>>API --cast--> Conductor --call--> scheduler
>> --call--> compute
>> --call-->.....
>> --db--> finally state update shows
>>completion of task
>>
>>Eventually the whole workflow, its persistence and rollback will be
>>controlled by the new framework. In the first case we may just make
>>sure the resource assignment gets rolled back if the call after the
>>schedule fails, and we correctly try to call the scheduler again? The
>>current live-migration scheduling code sort of does this kind of thing
>>already.
>>
>>> I was thinking that documenting the current situation, possibly @
>>> https://wiki.openstack.org/wiki/TheBetterPathToLiveMigration would help.
>>> Something like https://wiki.openstack.org/wiki/File:Run_workflow.png
>>>might
>>> help to easily visualize the current and fixed 'flow'/thread of
>>>execution.
>>
>>Seems valuable. I will do something for live-migration one before
>>starting on that. I kinda started on this (in text form) when I was
>>doing the XenAPI live-migration:
>>https://wiki.openstack.org/wiki/XenServer/LiveMigration#Live_Migration_RPC
>>_Calls
>>
>>We should probably do one for resize too.
>>
>>John
>
More information about the OpenStack-dev
mailing list