<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, sans-serif; ">

<div>I think it comes down to the question of how do we tie a workflow to a individual conductor without tying it so strongly that we can't move that workflow to another conductor. As john (correct me if I misunderstood) stated we could just assume that when

 a workflow is created it gets sent to a selected conductor via the MQ, that conductor then associates the workflow with its hostname using a DB txn (if some other conductor writes there hostname in then this other conductor just stops processing said workflow).

 Upon failure of said conductor then it could be expected that the conductor would be restarted (either automatically or some other way) and one of the first operations it could do would be to scan a workflow table to see any tasks it needs to finish. Then

 there could be an admin api which moves the workflows for one conductor to another conductor (altering the hostname association basically) which would allow movement of workflows (there needs to be some periodic task that conductors would do to periodically

 scan this table for 'moved' workflow-associations). </div>

<div><br>

</div>

<div>The ZK approach on the other hand would not require individual tying of workflows to conductors via hostnames but instead would tie workflows to conductors via a ZK lock on said workflow (or a lock on the reservation of said workflow). Instead of having

 to select a conductor to process said workflow (for the DB case) basically lets say the nova-api service would just throw that workflow into a ZK directory and let all conductors fight over the lock for said workflow. The one that acquires the lock is the

 one that starts doing said workflow, if it fails while doing that workflow the lock gets released (guaranteed by ZK) and then the other conductors can be triggered (due to the ZK watch concept) and they can restart the 'fight' for the lock for said workflow

 (and so on until the workflow is done).</div>

<div><br>

</div>

<div>To me this all is more of a primitive around 'acquiring work' which could say have an impl for ZK and one for the DB (where something like john says happens there). The ZK impl can acquire said work in a slightly different manner. I think both approaches

 likely can both be done if we have a clear idea of the basic fundamental operations and have a 'little' API for doing that impl with a DB and one with ZK. That might be acceptable?</div>

<div><br>

</div>

<div>-Josh</div>

<div><br>

</div>

<span id="OLK_SRC_BODY_SECTION">

<div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">

<span style="font-weight:bold">From: </span>Mike Wilson <<a href="mailto:geekinutah@gmail.com">geekinutah@gmail.com</a>><br>

<span style="font-weight:bold">Date: </span>Thursday, May 2, 2013 11:38 AM<br>

<span style="font-weight:bold">To: </span>OpenStack Development Mailing List <<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>><br>

<span style="font-weight:bold">Cc: </span>Joshua Harlow <<a href="mailto:harlowja@yahoo-inc.com">harlowja@yahoo-inc.com</a>><br>

<span style="font-weight:bold">Subject: </span>Re: [openstack-dev] Nova workflow management update<br>

</div>

<div><br>

</div>

<div>

<div>

<div dir="ltr">I agree, many people aren't going to want to run ZK just to run Openstack. Also, I'm not real familiar with ZK, but besides distributed locking, what other features are vital workflow management? I heard leader election mentioned earlier, but

 we don't necessarily have to run with a leader, I feel like a simpler design doesn't have a leader but rather just generic conductors that process workflows. Even with my large scale deployment, I would prefer to not have to incur the cost of operating ZK

 when I have other technologies I am already familiar with (DB, memcache, etc). I'll research ZK more so that I'm not so much in the dark, but can anyone explain why a leaderless group of conductors is bad? I think what I envision is more along the lines of

 what John Garbutt was explaining. Can we poke holes in that before requiring ZK?

<div><br>

</div>

<div style="">-Mike Wilson</div>

</div>

<div class="gmail_extra"><br>

<br>

<div class="gmail_quote">On Thu, May 2, 2013 at 12:27 PM, Alex Glikson <span dir="ltr">

<<a href="mailto:GLIKSON@il.ibm.com" target="_blank">GLIKSON@il.ibm.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<font face="sans-serif">Right.. Maybe this was not a good suggestion. The thought was that in small-scale deployments, requiring ZK might be a significant management overhead. While for large-scale ones it would be more acceptable. So, the question is how to

 make this work reasonably well on small scale without ZK, and enable flexible scale-up/out by adding ZK.</font><br>

<font face="sans-serif">Maybe assume one conductor that would serialize (some of the) tasks and keep enough state in DB for failure recovery if there is no ZK, and do it in a more scalable & resilient manner if ZK is present.</font><br>

<br>

<font face="sans-serif">Alex<br>

</font><br>

<br>

<tt><font>Joshua Harlow <<a href="mailto:harlowja@yahoo-inc.com" target="_blank">harlowja@yahoo-inc.com</a>> wrote on 02/05/2013 08:50:29 PM:<br>

<br>

> From: Joshua Harlow <<a href="mailto:harlowja@yahoo-inc.com" target="_blank">harlowja@yahoo-inc.com</a>></font></tt><br>

<tt><font>

<div class="im">> To: OpenStack Development Mailing List <openstack-<br>

</div>

> <a href="mailto:dev@lists.openstack.org" target="_blank">dev@lists.openstack.org</a>>, Alex Glikson/Haifa/IBM@IBMIL,

</font></tt><br>

<tt><font>> Date: 02/05/2013 08:50 PM</font></tt> <br>

<div class="im"><tt><font>> Subject: Re: [openstack-dev] Nova workflow management update</font></tt><br>

</div>

<tt><font>> <br>

<div>

<div class="h5">> So this brings up an interesting issue, the reason ZK exists or

<br>

> partially exists is that something like ZK api's over DB weren't <br>

> really possible. Or that’s what I thought :-P</div>

</div>

</font></tt><br>

<div class="HOEnZb">

<div class="h5"><tt><font>> <br>

> Or they weren't possible in a accurate and provable accurate manner.<br>

> So spending time creating apis that 'sorta' work with DB's but <br>

> really don't (since ZK wouldn't exist if it was possible) does seem <br>

> sorta awkward. I thought most cloud providers are already using <br>

> zookeeper, for these exact same reasons, and deploying it now-adays <br>

> is pretty simple… </font></tt><br>

<tt><font>> <br>

> I just worry about providing API's that really don't work correctly <br>

> with DB's that will cause more bugs (since certain problems just <br>

> can't be done with a DB, or at least any of the DBs that I have <br>

> used, maybe db2 can, idk) that we will have to say 'oh ya we know <br>

> that doesn't work with a DB'. But maybe that is a compromise that we<br>

> have to make and is a evolutionary process where the amount of bugs <br>

> that will be caused by DB impls will eventually just cause people to<br>

> move to the more attractive ZK backend… Its also sorta concerning <br>

> that those types of DB like bugs will be harder than heck to trace <br>

> down, but that might be a different issue that we can resolve.</font></tt> <br>

<tt><font>> <br>

> From: Alex Glikson <<a href="mailto:GLIKSON@il.ibm.com" target="_blank">GLIKSON@il.ibm.com</a>><br>

> Reply-To: OpenStack Development Mailing List <openstack-<br>

> <a href="mailto:dev@lists.openstack.org" target="_blank">dev@lists.openstack.org</a>><br>

> Date: Thursday, May 2, 2013 7:53 AM<br>

> To: OpenStack Development Mailing List <<a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a>><br>

> Subject: Re: [openstack-dev] Nova workflow management update</font></tt> <br>

<tt><font>> <br>

> Changbin Liu <<a href="mailto:changbin.liu@gmail.com" target="_blank">changbin.liu@gmail.com</a>> wrote on 02/05/2013 05:32:05 PM:<br>

> <br>

> > Subject: Re: [openstack-dev] Nova workflow management update <br>

> > <br>

> > Hi Joshua,  <br>

> > <br>

> > Just to share some thoughts: <br>

> [...]<br>

> <br>

> Using ZK makes a lot of sense.. The problem is that making ZooKeeper<br>

> a mandatory component to support even basic workflow management <br>

> might be an issue. So, the approach which seems to make most sense <br>

> is to define abstract internal interfaces for the various <br>

> capabilities that ZK can provide (distributed locking, leader <br>

> election, etc), and then have one or more implementations (one of <br>

> which might be based on ZK). This is the approach that has been <br>

> taken for the membership service (service group monitoring APIs) -- <br>

> introducing the flexibility to use ZK backend, but keeping the <br>

> default to be DB-backed.<br>

> <br>

> Regards, <br>

> Alex <br>

> <br>

> P.S. thinking about this.. would it be possible to implement ZK APIs<br>

> over a DB? with some limitations, perhaps..</font></tt></div>

</div>

<br>

_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br>

</blockquote>

</div>

<br>

</div>

</div>

</div>

</span>

</body>

</html>