[openstack-dev] Nova workflow management update
Changbin Liu
changbin.liu at gmail.com
Thu May 2 14:32:05 UTC 2013
Hi Joshua,
Just to share some thoughts:
On Wed, May 1, 2013 at 2:43 PM, Joshua Harlow <harlowja at yahoo-inc.com>wrote:
> I've started
> https://wiki.openstack.org/wiki/TheBetterPathToLiveMigrationResizing and
> will try to continue there.
>
> The other aspect that makes me wonder is after we have conductor doing
> stuff is how do we ensure that locking of what it is doing is done
> correctly.
>
> Say u have the following:
>
> API call #1 -> resize instance X (lets call this action A)
> API call #2 -> resize instance X (lets call this action B)
>
>
> Now both of those happen in the same millisecond, so what happens (thought
> game time!).
>
> It would seem they attempt to mark something in the DB saying 'working on
> X' by altering instance X's 'task/vm_state'. Ok so u can put a transaction
> around said write to the 'task/vm_state' of instance X to avoid both of
> those api calls attempting to continue doing the work. So that’s good. So
> then lets say api #1 sends a message to some conductor Z asking it to do
> the work via the MQ, that’s great, then the conductor Z starts doing work
> on instance X and such.
>
> So now the big iffy question that I have is what happens if conductor Z is
> 'killed' (say via error, exception, power failure, kill -9). What happens
> to action A? How can another conductor be assigned the work to do action
> A? Will there be a new periodic task to scan the DB for 'dead' actions,
> how do we determine if an action is dead or just taking a very long time?
> This 'liveness' issue is a big one that I think needs to be considered and
> if conductor and zookeeper get connected, then I think it can be done.
>
Then the other big iffy stuff is how do we stop a third API call from
> invoking a third action on a resource associated with instance X (say a
> deletion of a volume) while the first api action is still being conducted,
> just associating a instance level lock via 'task/vm_state' is not the
> correct way to way to lock resources associated with instance X. This is
> where zookeeper can come into play again (since its core design was built
> for distributed locking) and it can be used to not only lock the instance
> X 'task/vm_state' but all other resources associated with instance X (in a
> reliable manner).
>
>
In our prior work, we run multiple cloud controller instances (or
"conductors" here), one as leader and others as followers. They are
coordinated by ZooKeeper leader election. All the transaction-related data
are as well stored in ZooKeeper for HA. A lock manager (handling
concurrency) is implemented in memory but it is discard-safe since it can
always be rebuilt from ZooKeeper data store upon leader controller failure.
If a transaction hangs for a long time, then you have two options: either
sending a "signal" to rollback this transaction, or simply kill it (which
means only rollback logical data and immediately return, without performing
any further physical actions). The latter may leave behind logical/physical
inconsistencies, which can be resolved by periodic repairing
(resource reconciliation) procedures.
> Thoughts?
>
> On 5/1/13 10:56 AM, "John Garbutt" <john at johngarbutt.com> wrote:
>
> >Hey,
> >
> >I think some lightweight sequence diagrams could make sense.
> >
> >On 29 April 2013 21:55, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
> >> Any thoughts on how the current conductor db-activity works with this?
> >> I can see two entry points to conductor:
> >> DB data calls
> >> |
> >> ------------------------------------------Conductor-->RPC/DB calls to
> >>do
> >> this stuff
> >> |
> >> Workflow on behalf of something calls |
> >> | |
> >> ---------------------------------------------|
> >>
> >> Maybe its not a concern for 'H' but it seems one of those doesn¹t belong
> >> there (cough cough DB stuff).
> >
> >Maybe for the next release. It should become obvious I guess. I hope
> >those db calls will disappear once we pull the workflows properly into
> >conductor and the other servers become more stateless (in terms of
> >nova db state).
> >
> >Key question: Should the conductor be allowed to make DB calls? I think
> >yes?
> >
> >> My writeup @ https://wiki.openstack.org/wiki/StructuredStateManagement
> >>is
> >> a big part of the overall goal I think, where I think the small
> >>iterations
> >> are part of this goal, yet likely both small and big goals will be
> >> happening at once, so it would be useful to ensure that we talk about
> >>the
> >> bigger goal and make sure the smaller iteration goal will eventually
> >> arrive at the bigger goal (or can be adjusted to be that way). Since
> >>some
> >> rackspace folks will also be helping out building the underlying
> >> foundation (convection library) for the end-goal it would be great to
> >>work
> >> together and make sure all small iterations also align with that
> >> foundational library work.
> >
> >Take a look at spawn in XenAPI, it is heading down this direction:
> >
> https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L3
> >35
> >
> >I think we should just make a very small bit of the operation do
> >rollback and state management, which is more just an exercise, and
> >then start to pull more of the code into line as time progresses.
> >Probably best done on something that has already been pulled into a
> >conductor style job?
> >
> >> I'd be interested in what u think about moving the scheduler code
> >>around,
> >> since this also connects into some work the cisco folks want to do for
> >> better scheduling, so that is yet another coordination of work that
> >>needs
> >> to happen (to end up at the same end-goal there as well).
> >
> >Yes, I think its very related. I see this kind of thing:
> >
> >API --cast--> Conductor --call--> scheduler
> > --call--> compute
> > --call-->.....
> > --db--> finally state update shows
> >completion of task
> >
> >Eventually the whole workflow, its persistence and rollback will be
> >controlled by the new framework. In the first case we may just make
> >sure the resource assignment gets rolled back if the call after the
> >schedule fails, and we correctly try to call the scheduler again? The
> >current live-migration scheduling code sort of does this kind of thing
> >already.
> >
> >> I was thinking that documenting the current situation, possibly @
> >> https://wiki.openstack.org/wiki/TheBetterPathToLiveMigration would
> help.
> >> Something like https://wiki.openstack.org/wiki/File:Run_workflow.png
> >>might
> >> help to easily visualize the current and fixed 'flow'/thread of
> >>execution.
> >
> >Seems valuable. I will do something for live-migration one before
> >starting on that. I kinda started on this (in text form) when I was
> >doing the XenAPI live-migration:
> >
> https://wiki.openstack.org/wiki/XenServer/LiveMigration#Live_Migration_RPC
> >_Calls
> >
> >We should probably do one for resize too.
> >
> >John
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130502/76a5f245/attachment.html>
More information about the OpenStack-dev
mailing list