Youcef, <div><br></div><div>Please see comments inline.<br><br><div class="gmail_quote">2012/11/15 Youcef Laribi <span dir="ltr"><<a href="mailto:Youcef.Laribi@eu.citrix.com" target="_blank">Youcef.Laribi@eu.citrix.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Ilya,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Both designs are valid (whether the LBaaS plugin implements a uniform asynchronous model, or whether to leave it to each vendor to decide whether their driver is synchronous or not), and I’m fine with choosing the one you described. I have a few questions on it.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">In that setup, I’m assuming we will have 2 processes: The Quantum service process and the Agent process. The two communicate thru message queues. No vendor-specific code will be in the Quantum service process. The Agent process is the one that has all the vendors drivers loaded in.</span></p>
</div></blockquote><div>Yes. That's right.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><p class="MsoNormal">
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Where are you planning for the agent process to run? On the same host as the Quantum service process? On a separate host? On the device? Imagine in the same agent, you have a case where one driver assumes it is on the same machine as the device (HA-Proxy?), while another driver uses a remote protocol to configure the device (the agent cannot run on the device as the device is a sealed appliance), where does the agent run in this case?</span></p>
</div></blockquote><div>Agent process may run on the same host as the Quantum service. In small setups there should be enough only one agent process per cloud. In large setups there may be several agents each processing messages. The agent is not needed on devices, since it controls them remotely via management interface.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><p class="MsoNormal"><span style="color:rgb(31,73,125);font-family:Calibri,sans-serif;font-size:11pt"> </span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The other question I have is on the LBaaS plugin workflow that you described in this diagram: </span><a href="http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests" target="_blank">http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests</a>. <span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Here it seems that we are returning a response to the user even before we update the database. So, this could mean that if the Quantum service crashes before updating the database, we would have lost all details of the request, even though we have returned a resource ID, and a “PENDING_CREATE” status to the user. So, when the user queries the service (after it has restarted) to get the status of the creation, the resource wouldn’t even exist since it has never been saved to the DB. What do you think of updating the DB before returning the response to the user?</span></p>
</div></blockquote><div> Well..(maybe I'm not so good at diagram drawing).. For the case of object creation, I see the following workflow:</div><div><ol><li>The request is accepted by Plugin and validated. If not valid the error is returned.</li>
<li>The record is added into DB with status "PENDING_CREATE"</li><li>The message is pushed into queue</li><li>Plugin responses to user with HTTP 202 reply. Steps 1-4 are done synchronously.</li><li>The message is processed by Agent, driver and device. The result is pushed into Plugin's queue</li>
<li>Plugin retrieves result and updates DB with either "ACTIVE" or "ERROR" status</li></ol>If crash occurs on 1) or inside 2) then the object will be lost, but user will know about this since this happens during request processing. If the crash happens on 3-6 then the object remains in "PENDING_" state and user will have to decide how long to keep such objects. The worst situation is when failure happens on 6), because the change is applied to device, but not reflected in DB. In both last cases user will want to do clean up and here we need to remove objects in PENDING_ state not only from DB, but also send command to device.</div>
<div><br></div><div>Thanks,</div><div>Ilya</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><p class="MsoNormal">
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Thanks<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Youcef<u></u><u></u></span></p><p class="MsoNormal"><a name="13b00937907af4c5__MailEndCompose"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></a></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Ilya Shakhat [mailto:<a href="mailto:ishakhat@mirantis.com" target="_blank">ishakhat@mirantis.com</a>] <br>
<b>Sent:</b> Tuesday, November 13, 2012 11:45 AM<br><b>To:</b> OpenStack Development Mailing List<br><b>Subject:</b> Re: [openstack-dev] [Quantum][LBaaS] Architecture: Agents, Drivers, async calls<u></u><u></u></span></p>
<div><div class="h5"><p class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal">Some reasons why LBaaS core should be responsible for processing requests asynchronously: <u></u><u></u></p></div><ul type="disc"><li class="MsoNormal">
Driver code will be as simple as possible, in most cases it will just translate LBaaS model into device-specific;<u></u><u></u></li><li class="MsoNormal">There will be no dependencies between drivers and user requests will take approximately the same time for different drivers. This will avoid a case when some driver take too much time to apply config synchronously and block other requests.<u></u><u></u></li>
<li class="MsoNormal">REST API is already asynchronous. <u></u><u></u></li></ul><div><p class="MsoNormal">To summarize what Eugene proposed, LBaaS will consist of (see diagram <a href="http://wiki.openstack.org/Quantum/LBaaS/Architecture?action=AttachFile&do=view&target=lbaas_architecture_new.png" target="_blank">http://wiki.openstack.org/Quantum/LBaaS/Architecture?action=AttachFile&do=view&target=lbaas_architecture_new.png</a>): <u></u><u></u></p>
</div><ul type="disc"><li class="MsoNormal">Extension - it's a front-end of a service<u></u><u></u></li><li class="MsoNormal">Plugin - responsible for request processing, persistence and core functionality (scheduling). All operations may be thought as atomic and quick. They are done synchronously.<u></u><u></u></li>
<li class="MsoNormal">Agent - responsible for executing commands on specific devices with help of drivers. It gets requests from Plugin via MQ and process them in asynchronous way<u></u><u></u></li><li class="MsoNormal">Driver - translates from unified API to vendor-specific. Its work may be time-consuming.<u></u><u></u></li>
</ul><div><div><p class="MsoNormal">Among all operations, update look the most complicated; the case of 2 concurrent updates and workflow is shown on <a href="http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests" target="_blank">http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests</a>. <u></u><u></u></p>
</div><div><p class="MsoNormal"><u></u> <u></u></p></div><div><p class="MsoNormal">Thanks,<u></u><u></u></p></div><div><p class="MsoNormal">Ilya<u></u><u></u></p></div><p class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal">
2012/11/13 Youcef Laribi <<a href="mailto:Youcef.Laribi@eu.citrix.com" target="_blank">Youcef.Laribi@eu.citrix.com</a>><u></u><u></u></p><div><div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Eugene,</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Another way to look at the workflow is to make sure that the LBaaS Plugin updates the database synchronously (and generates the resource ID before returning to the user), and then let it to the driver implementations to decide whether they want to handle the call synchronously or asynchronously.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I like drawing pictures to avoid misunderstandings, so here are 2 pictures illustrating 2 vendors, one deciding to implement their driver in a synchronous way, and the other one in an asynchronous way. </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Synchronous driver implementation: </span><a href="http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+synchronous+driver+implementation.png" target="_blank">http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+synchronous+driver+implementation.png</a><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Asynchronous driver implementation: </span><a href="http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+asynchronous+driver+implementation.png" target="_blank">http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+asynchronous+driver+implementation.png</a><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">As far as the plugin is concerned, the calls to drivers are always synchronous in the sense that the driver doesn’t have to deal with queues, etc. The plugin should expect the driver to return either a “COMPLETED” status (meaning the call has been executed on the device), ora “PENDING” status (meaning that the driver has started the operation but it is not complete). The plugin updates the database in both cases with the outcome of the call, and returns the result to the user. </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">This would allow a lot of freedom in driver implementations. A vendor can start with a synchronous implementation because it is quick to implement, and then later on move on to an asynchronous implementation without impacting the LBaaS plugin. Or it can implement some calls synchronously while other calls (which might take a long time to complete) asynchronously. You can also have different vendors using different driver strategies wrt. Synchronicity, or using different queuing mechanisms.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Thanks</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Youcef</span><u></u><u></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span><u></u><u></u></p><p class="MsoNormal"><a name="13b00937907af4c5_13afa61ba5fb651b_13afa501394d9ae2_13af63"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"> </span></a><u></u><u></u></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Eugene Nikanorov [mailto:<a href="mailto:enikanorov@mirantis.com" target="_blank">enikanorov@mirantis.com</a>] <br>
<b>Sent:</b> Monday, November 12, 2012 6:24 AM<br><b>To:</b> <a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a><br><b>Subject:</b> [openstack-dev] [Quantum][LBaaS] Architecture: Agents, Drivers, async calls</span><u></u><u></u></p>
<div><div><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Hi folks,<u></u><u></u></p><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">In the latest meeting we've mentioned several important architectural points including:<u></u><u></u></p>
</div><div><p class="MsoNormal">- agents vs direct driver call<u></u><u></u></p></div><div><p class="MsoNormal">- asynchronous execution<u></u><u></u></p></div><div><p class="MsoNormal">- dispatching a generic REST call to a proper driver.<u></u><u></u></p>
</div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">I would like to present how Mirantis team sees this based on our previous experience with LBaaS and other openstack components.<u></u><u></u></p>
</div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">We need asynchronous execution in the sense that client gets immediate response while actual device configuration happens later.<u></u><u></u></p>
</div><div><p class="MsoNormal">Workflow of such operation could look like following:<u></u><u></u></p></div><div><p class="MsoNormal">1) client makes REST call; receives an object it has created/modified with PENDING status <u></u><u></u></p>
</div><div><p class="MsoNormal">2) call is dispatched to a plugin, plugin creates/modifies/etc an object in the database<u></u><u></u></p></div><div><p class="MsoNormal">3) plugin calls driver to apply new configuration to specific device<u></u><u></u></p>
</div><div><p class="MsoNormal">4) driver finishes applying configuration, plugin updates DB object<u></u><u></u></p></div><div><p class="MsoNormal">5) client polls objectID and gets final status of operation.<u></u><u></u></p>
</div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">Now depending on approach we take, (3) could expand into different sequence of operations.<u></u><u></u></p></div><div><p class="MsoNormal">
One of the good options to choose could be using agent between plugin and drivers. In this case (3) expands to:<u></u><u></u></p></div><div><p class="MsoNormal">3.1 plugin posts message to mq<u></u><u></u></p></div><div><p class="MsoNormal">
3.2. message is consumed by one of the running service agents<u></u><u></u></p></div><div><p class="MsoNormal">3.3. agent calls corresponding driver directly in synchronous way.<u></u><u></u></p></div><div><p class="MsoNormal">
3.4. agent posts message upon completion.<u></u><u></u></p></div><div><p class="MsoNormal">3.5. plugin consumes the message and updates DB object with final status<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p>
</div><div><p class="MsoNormal">Such approach solves at least two potential problems:<u></u><u></u></p></div><div><p class="MsoNormal">1. plugin may be simplified since it is not required to implement call/work item queuing <u></u><u></u></p>
</div><div><p class="MsoNormal">2. Applying device configuration is time consuming task which could take seconds. <u></u><u></u></p></div><div><p class="MsoNormal">Both plugin and agent has thread limit for any concurrent operations. <u></u><u></u></p>
</div><div><p class="MsoNormal">Handling heavy workload in large deployments will be simple with several agents consuming messages from mq.<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">
Also this allows to create synchronous drivers since asyncness will be handled by mq + agent.<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">Another option could be calling drivers directly without any asyncness at all while preserving above workflow (1-5). <u></u><u></u></p>
</div><div><p class="MsoNormal">That could work as temporary fast solution while allowing to split it to "plugin + agent approach" relatively easily.<u></u><u></u></p></div><div><p class="MsoNormal"> <u></u><u></u></p>
</div><div><p class="MsoNormal">Regarding the dispatching REST calls to proper driver:<u></u><u></u></p></div><div><p class="MsoNormal">In fact, VIP object should contain reference to particular device it is created at. <u></u><u></u></p>
</div><div><p class="MsoNormal"><a href="http://wiki.openstack.org/LBaaS/CoreResourceModel/proposal" target="_blank">http://wiki.openstack.org/LBaaS/CoreResourceModel/proposal</a> misses that device management part, I think it was just implied there.<u></u><u></u></p>
</div><div><p class="MsoNormal">Every balancer-related object references the VIP and hence references the specific device where it was created. <u></u><u></u></p></div><div><p class="MsoNormal">E,g, when a call for any object is made, plugin needs to extract device type from DB following those references and later plugin or agent will use it to call particular driver.<u></u><u></u></p>
</div><div><p class="MsoNormal"> <u></u><u></u></p></div><div><p class="MsoNormal">Thanks,<u></u><u></u></p></div><div><p class="MsoNormal">Eugene.<u></u><u></u></p></div></div></div></div></div><p class="MsoNormal" style="margin-bottom:12.0pt">
<br>_______________________________________________<br>OpenStack-dev mailing list<br><a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><u></u><u></u></p>
</div><p class="MsoNormal"><u></u> <u></u></p></div></div></div></div><br>_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br></div>