<div dir="ltr">Hi Joshua, <div><br></div><div style>Just to share some thoughts:</div><div><br></div><div class="gmail_extra"><div class="gmail_quote">On Wed, May 1, 2013 at 2:43 PM, Joshua Harlow <span dir="ltr"><<a href="mailto:harlowja@yahoo-inc.com" target="_blank">harlowja@yahoo-inc.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I've started<br>

<a href="https://wiki.openstack.org/wiki/TheBetterPathToLiveMigrationResizing" target="_blank">https://wiki.openstack.org/wiki/TheBetterPathToLiveMigrationResizing</a> and<br>

will try to continue there.<br>

<br>

The other aspect that makes me wonder is after we have conductor doing<br>

stuff is how do we ensure that locking of what it is doing is done<br>

correctly.<br>

<br>

Say u have the following:<br>

<br>

API call #1 -> resize instance X (lets call this action A)<br>

API call #2 -> resize instance X (lets call this action B)<br>

<br>

<br>

Now both of those happen in the same millisecond, so what happens (thought<br>

game time!).<br>

<br>

It would seem they attempt to mark something in the DB saying 'working on<br>

X' by altering instance X's 'task/vm_state'. Ok so u can put a transaction<br>

around said write to the 'task/vm_state' of instance X to avoid both of<br>

those api calls attempting to continue doing the work. So that’s good. So<br>

then lets say api #1 sends a message to some conductor Z asking it to do<br>

the work via the MQ, that’s great, then the conductor Z starts doing work<br>

on instance X and such.<br>

<br>

So now the big iffy question that I have is what happens if conductor Z is<br>

'killed' (say via error, exception, power failure, kill -9). What happens<br>

to action A? How can another conductor be assigned the work to do action<br>

A? Will there be a new periodic task to scan the DB for 'dead' actions,<br>

how do we determine if an action is dead or just taking a very long time?<br>

This 'liveness' issue is a big one that I think needs to be considered and<br>

if conductor and zookeeper get connected, then I think it can be done. <br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


Then the other big iffy stuff is how do we stop a third API call from<br>

invoking a third action on a resource associated with instance X (say a<br>

deletion of a volume) while the first api action is still being conducted,<br>

just associating a instance level lock via 'task/vm_state' is not the<br>

correct way to way to lock resources associated with instance X. This is<br>

where zookeeper can come into play again (since its core design was built<br>

for distributed locking) and it can be used to not only lock the instance<br>

X 'task/vm_state' but all other resources associated with instance X (in a<br>

reliable manner).<br>

<br></blockquote><div><br></div><div>In our prior work, we run multiple cloud controller instances (or "conductors" here), one as leader and others as followers. They are coordinated by ZooKeeper leader election. All the transaction-related data are as well stored in ZooKeeper for HA. A lock manager (handling concurrency) is implemented in memory but it is discard-safe since it can always be rebuilt from ZooKeeper data store upon leader controller failure. </div>


<div><br></div><div>If a transaction hangs for a long time, then you have two options: either sending a "signal" to rollback this transaction, or simply kill it (which means only rollback logical data and immediately return, without performing any further physical actions). The latter may leave behind logical/physical inconsistencies, which can be resolved by periodic repairing (resource reconciliation) procedures. </div>


<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

Thoughts?<br>

<div><div><br>

On 5/1/13 10:56 AM, "John Garbutt" <<a href="mailto:john@johngarbutt.com" target="_blank">john@johngarbutt.com</a>> wrote:<br>

<br>

>Hey,<br>

><br>

>I think some lightweight sequence diagrams could make sense.<br>

><br>

>On 29 April 2013 21:55, Joshua Harlow <<a href="mailto:harlowja@yahoo-inc.com" target="_blank">harlowja@yahoo-inc.com</a>> wrote:<br>

>> Any thoughts on how the current conductor db-activity works with this?<br>

>> I can see two entry points to conductor:<br>

>> DB data calls<br>

>>   |<br>

>>   ------------------------------------------Conductor-->RPC/DB calls to<br>

>>do<br>

>> this stuff<br>

>>                                                |<br>

>> Workflow on behalf of something calls          |<br>

>>   |                                            |<br>

>>   ---------------------------------------------|<br>

>><br>

>> Maybe its not a concern for 'H' but it seems one of those doesnąt belong<br>

>> there (cough cough DB stuff).<br>

><br>

>Maybe for the next release. It should become obvious I guess. I hope<br>

>those db calls will disappear once we pull the workflows properly into<br>

>conductor and the other servers become more stateless (in terms of<br>

>nova db state).<br>

><br>

>Key question: Should the conductor be allowed to make DB calls? I think<br>

>yes?<br>

><br>

>> My writeup @ <a href="https://wiki.openstack.org/wiki/StructuredStateManagement" target="_blank">https://wiki.openstack.org/wiki/StructuredStateManagement</a><br>

>>is<br>

>> a big part of the overall goal I think, where I think the small<br>

>>iterations<br>

>> are part of this goal, yet likely both small and big goals will be<br>

>> happening at once, so it would be useful to ensure that we talk about<br>

>>the<br>

>> bigger goal and make sure the smaller iteration goal will eventually<br>

>> arrive at the bigger goal (or can be adjusted to be that way). Since<br>

>>some<br>

>> rackspace folks will also be helping out building the underlying<br>

>> foundation (convection library) for the end-goal it would be great to<br>

>>work<br>

>> together and make sure all small iterations also align with that<br>

>> foundational library work.<br>

><br>

>Take a look at spawn in XenAPI, it is heading down this direction:<br>

><a href="https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L3" target="_blank">https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L3</a><br>

>35<br>

><br>

>I think we should just make a very small bit of the operation do<br>

>rollback and state management, which is more just an exercise, and<br>

>then start to pull more of the code into line as time progresses.<br>

>Probably best done on something that has already been pulled into a<br>

>conductor style job?<br>

><br>

>> I'd be interested in what u think about moving the scheduler code<br>

>>around,<br>

>> since this also connects into some work the cisco folks want to do for<br>

>> better scheduling, so that is yet another coordination of work that<br>

>>needs<br>

>> to happen (to end up at the same end-goal there as well).<br>

><br>

>Yes, I think its very related. I see this kind of thing:<br>

><br>

>API --cast--> Conductor --call--> scheduler<br>

>                                    --call--> compute<br>

>                                    --call-->.....<br>

>                                    --db--> finally state update shows<br>

>completion of task<br>

><br>

>Eventually the whole workflow, its persistence and rollback will be<br>

>controlled by the new framework. In the first case we may just make<br>

>sure the resource assignment gets rolled back if the call after the<br>

>schedule fails, and we correctly try to call the scheduler again? The<br>

>current live-migration scheduling code sort of does this kind of thing<br>

>already.<br>

><br>

>> I was thinking that documenting the current situation, possibly @<br>

>> <a href="https://wiki.openstack.org/wiki/TheBetterPathToLiveMigration" target="_blank">https://wiki.openstack.org/wiki/TheBetterPathToLiveMigration</a> would help.<br>

>> Something like <a href="https://wiki.openstack.org/wiki/File:Run_workflow.png" target="_blank">https://wiki.openstack.org/wiki/File:Run_workflow.png</a><br>

>>might<br>

>> help to easily visualize the current and fixed 'flow'/thread of<br>

>>execution.<br>

><br>

>Seems valuable. I will do something for live-migration one before<br>

>starting on that. I kinda started on this (in text form) when I was<br>

>doing the XenAPI live-migration:<br>

><a href="https://wiki.openstack.org/wiki/XenServer/LiveMigration#Live_Migration_RPC" target="_blank">https://wiki.openstack.org/wiki/XenServer/LiveMigration#Live_Migration_RPC</a><br>

>_Calls<br>

><br>

>We should probably do one for resize too.<br>

><br>

>John<br>

<br>

_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</div></div></blockquote></div><br></div></div>