<div dir="ltr">Salvatore,<div><br></div><div style>I agree that we've gotten off topic with this thread a bit, Aaron's original proposal is an improvement over what happens today, but I believe that a resolution to the workflow issue and separation of duties still needs to be addressed. I'll leave this thread alone with that being said. What's the appropriate place to discuss this more? Should we get a blueprint out, meeting, ...? I'd love to understand the issue more completely. In our operational experience at Bluehost (running quantum with openvswitch plugin on thousands of nodes), this for sure causes us headaches. If we are not doing it the right way, or if there's a better way to fix it we'd love to be part of that.</div>

<div style><br></div><div style>-Mike</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, May 17, 2013 at 5:06 PM, Salvatore Orlando <span dir="ltr"><<a href="mailto:sorlando@nicira.com" target="_blank">sorlando@nicira.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I often use the same analogy and Joshua; however I usually imagine a slightly different workflow (and mapping to openstack services).<div>

Copying Joshua steps when possible, my workflow is a bit like this:</div>

<div><br></div><div><div style="font-family:Calibri,sans-serif;font-size:14px">1. Define specs for the physical box (i.e.: use nova flavors, and get nova-api reserve an instance - abstract doesn't have to be libvirt).</div>

<div style="font-family:Calibri,sans-serif;font-size:14px">2. Decide where the physical box should be placed in rack and ensure available power switches and so-on (ie: have nova-scheduler place the VM in a place 'where it makes sense' *)<br>

</div><div style="font-family:Calibri,sans-serif;font-size:14px">3. Ensure networking is in place is in the data center to provide connectivity to the virtual machine (ie: make sure Quantum sets up the network infrastructure)<br>

</div><div style="font-family:Calibri,sans-serif;font-size:14px">     This was referred as: Connect <span>network</span>-cables to router, configure router ports/flows and so-on.</div><div style="font-family:Calibri,sans-serif;font-size:14px">

4. Build the physical box (i.e.: have nova compute transform the reservation into a real instance)<br></div><div style="font-family:Calibri,sans-serif;font-size:14px">4.a) plug the CPU and the memory modules (build the libvirt XML spec, or create a vm record with XenAPI, or vSphere APIs)</div>

<div style="font-family:Calibri,sans-serif;font-size:14px">4.a) Connect hard-drives to physical box (equivalent to downloading the image from glance and/or talking to cinder to do the same - and then update the VM record as necessary)</div>

<div style="font-family:Calibri,sans-serif;font-size:14px">4.b) Connect <span>network</span>-cables to physical box (i.e.: plug the vifs in the virtual network ports - and then, again, update the VM record)</div><div class="im">

<div style="font-family:Calibri,sans-serif;font-size:14px">5.  Turn on said box by pressing the power switch/button.<br></div><div style="font-family:Calibri,sans-serif;font-size:14px">6. Report back to user that box is now ready for use.</div>

<div style="font-family:Calibri,sans-serif;font-size:14px"><br></div></div><div style="font-family:Calibri,sans-serif;font-size:14px">* 'where it makes sense' == using the scheduling algorithm that best suits what you need to do with that instance (call it application level affinity or else, I understand very little about these things)</div>

<div style="font-family:Calibri,sans-serif;font-size:14px"><br></div><div style="font-family:Calibri,sans-serif;font-size:14px">From this thread, it seems there are several points of contention. </div><div><font face="Calibri, sans-serif"><span style="font-size:14px">About  out-of-order execution of workflow steps, I would argue that it's perfectly normal that you have the networking guy setting up switches and routers while your compute guy is assembling the server. This why it is ok for me if steps 3 and 4 are executed out of order. Step 5 should be a synchronisation point; so you start up a VM only when you know networking has been configured, the drives have been installed in the bays, and the operating system has been installed. From what I gather, the blueprint proposed by Aaron goes in this direction. If nova-api fails to setup networking, the VM would never be started. (and I'm not saying this just because he's on my team!)</span></font></div>

<div style="font-family:Calibri,sans-serif;font-size:14px"><br></div><div style="font-family:Calibri,sans-serif;font-size:14px">About whether Quantum should take responsibility of plugging network cards, this was one of my doubts in the early Quantum days. At the end of the day the 'real-world analogy' helped me understand that when installing a server the compute guy would not call the networking guy for plugging the NICs on the MB and plugging the cables in the patch panel, but would probably do that by himself. Of course one could not base system architecture on analogies, and this is definetely an interesting discussion; to me however it is not fundamental to the topic of this blueprint, which is about provisioning network resources early in the instance creation process.</div>

<div style="font-family:Calibri,sans-serif;font-size:14px">I did not have a chance to listen to Jun's talk at the Openstack summit. I promise I will watch it and try to gather more insights in the flaws being claimed here!</div>

<div style="font-family:Calibri,sans-serif;font-size:14px"><br></div><div style="font-family:Calibri,sans-serif;font-size:14px">On the other hand, Robert has raised a very interesting points about bare metal provisioning.</div>

<div style="font-family:Calibri,sans-serif;font-size:14px">Doing early provisioning of the network in this case might not work, unless you've already scheduled it. If there's an interest in having Quantum work with Ironic, this is something we need to look into as well. My gut feeling is that we might need a different Quantum plugin for Ironic, unless the driver model that will come in ML2 would allow Quantum to handle also physical network interfaces.</div>

<div style="font-family:Calibri,sans-serif;font-size:14px"><br></div><div style="font-family:Calibri,sans-serif;font-size:14px">Another question from me is instead how this would work with the nova-network provisioning model?</div>

<div style="font-family:Calibri,sans-serif;font-size:14px">One of the reason for which network is provided on the compute node, is that this allows quantum and nova-network to expose the same API interface to the compute manager. If we move Quantum provisioning to nova-api, would this invalidate this model?</div>

<div style="font-family:Calibri,sans-serif;font-size:14px"><br></div><div style="font-family:Calibri,sans-serif;font-size:14px">I was hoping to keep this post short, but as it often happen with me, it became a poem (more alike to Dante's Divina Commedia rather than Skakespeare's sonnets).</div>

<div style="font-family:Calibri,sans-serif;font-size:14px"><br></div><div style="font-family:Calibri,sans-serif;font-size:14px">Regards,</div><div style="font-family:Calibri,sans-serif;font-size:14px">Salvatore</div><div style="font-family:Calibri,sans-serif;font-size:14px">

<br></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On 17 May 2013 21:32, Jun Cheol Park <span dir="ltr"><<a href="mailto:jun.park.earth@gmail.com" target="_blank">jun.park.earth@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Joshua,</div><div><br></div><div>Yes, pretty much just like that! Thanks for sharing such a concise and clear analogy.</div>

<span><font color="#888888"><div><br></div><div>-Jun</div></font></span><div><div><br><br><div class="gmail_quote">On Fri, May 17, 2013 at 12:42 PM, Joshua Harlow <span dir="ltr"><<a href="mailto:harlowja@yahoo-inc.com" target="_blank">harlowja@yahoo-inc.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="font-size:14px;font-family:Calibri,sans-serif;word-wrap:break-word">

<div>I was talking about this problem with my management and had a good analogy that might make the flaw a little more visible.</div>

<div><br>

</div>

<div><i>Thought experiment</i>: Imagine a VM as a physical box, imagine nova/quantum/cinder/glance as the IT personnel.</div>

<div><br>

</div>

<div>It seems like the IT personnel would go through the following likely overly simplistic steps when setting up a physical box:</div>

<div><br>

</div>

<div>1. Get physical box from supplier (equivalent to nova reserving a VM with libvirt) that matches desired specs (aka flavor).</div>

<div>   a. Place physical box in rack and ensure available power switches and so-on. </div>

<div>2. Connect hard-drives to physical box (equivalent to downloading the image from glance, or talking to cinder to do the same)</div>

<div>3. Connect network-cables to physical box (equivalent to talking to quantum, having quantum-agent plug things in…)</div>

<div>   a. Connect network-cables to router, configure router ports/flows and so-on.</div>

<div>4.  Turn on said box by pressing the power switch/button.</div>

<div>5. Report back to user that box is now ready for use.</div>

<div><br>

</div>

<div>From the below it seems to be that the 3,4&5 can happen simultaneously (or out of order)? That seems pretty bad and is not something I would expect when setting up a physical box. </div>

<div><br>

</div>

<div>Jun would that be a correct 'simplification' of the problem?</div>

<div><br>

</div>

<span>

<div style="border-right:medium none;padding-right:0in;padding-left:0in;padding-top:3pt;text-align:left;font-size:11pt;border-bottom:medium none;font-family:Calibri;border-top:#b5c4df 1pt solid;padding-bottom:0in;border-left:medium none">

<span style="font-weight:bold">From: </span>Jun Cheol Park <<a href="mailto:jun.park.earth@gmail.com" target="_blank">jun.park.earth@gmail.com</a>><div><br>

<span style="font-weight:bold">Reply-To: </span>OpenStack Development Mailing List <<a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a>><br>

</div><span style="font-weight:bold">Date: </span>Thursday, May 16, 2013 3:05 PM<div><br>

<span style="font-weight:bold">To: </span>OpenStack Development Mailing List <<a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a>><br>

<span style="font-weight:bold">Subject: </span>Re: [openstack-dev] [Nova][Quantum] Move quantum port creation to nova-api<br>

</div></div><div><div>

<div><br>

</div>

<div>

<div>

<div>

<div>Aaron,</div>

<div>

<div><br>

</div>

>@Mike - I think we'd still want to leave nova-compute to create the tap interfaces and sticking  external-ids on them though.

<div><br>

</div>

<div>Sorry, I don't get this. Why do we need to leave nova-compute to create tap interfaces? That behavior has been a serious problem and a design flaw in dealing with ports, as Mike and I presented in Portland (Title: Using OpenStack In A Traditional Hosting

 Environment). </div>

<div><br>

</div>

<div>All,</div>

<div><br>

</div>

<div>Please, let me share the problems that we ran into due to such a design flaw between nova-compute and quantum-agent.</div>

<div><br>

</div>

<div>1. Using external-ids as an implicit triggering mechanism for deploying OVS flows (we used OVS plugin on hosts) causes inconsistencies between quantum DB and actual OVS tap interfaces on hosts. For example, even when necessary OVS flows have not been set

 up or failed for whatever reason (messaging system unstable, or quantum-server down, etc), nova-compute unknowingly declares that a VM is in the "active" state as long as nova-compute successfully creates OVS taps and sets up external-ids. But, the VM does

 not have actual network connectivity until somebody (here quantum-agent) deploys desired OVS flows. At this point, it is very hard to track down what goes wrong because nova list shows the VM is "active." This kind of inconsistency happens a lot because a

 quantum API (which quantum-server provided, here e.g., create_port()) only manages its quantum DB, but does not deal with actual network objects (e.g., OVS taps on hosts). In this design, there is no way to verify the actual state of targeting network objects. </div>

<div><br>

</div>

<div>  Q. What if a quantum API really deals with network objects (e.g., OVS taps), not only updating quantum DB?</div>

<div>  A. Nova-compute now can call a truly abstracted quantum API for creating a real port (or an OVS tap interface) on a targeting host, and then wait for a response from the call to see if an OVS tap interface is really created on the host. This way, nova-compute

 is able to make sure what is going on before proceeding the rest of tasks for creating a new VM. When there are some tasks that need to be taken care of regarding ports such as QoS (as Henry mentioned), quota (as this thread was invoked from), etc, nova-compute

 then decides what would be a next step (at least it would not blindly say that the VM is active).</div>

<div><br>

</div>

<div>2. Another example as the side effect of tap being created by nova-compute. When a host is rebooted, we expect all the VMs are automatically restarted. However, it's not possible. Here is why. When nova-compute restarts, it expects to see libvirtd running.

 Otherwise, nova-compute immediately stops. So we have to first start libvirtd before nova-compute. Now when libvirtd starts, it expects that all the OVS taps exist so that it can successfully start all the VMs that are supposed to use OVS taps. However, at

 this point since we have not started nova-compute that would create OVS taps, restarting libvirtd fails to restart VMs due to no taps found. So I ended up adding "restart libvirtd" in rc.local so that we can make libvirtd retry to restart VMs after nova-compute

 creates OVS taps.</div>

<div> </div>

<div> Q. Again, what if quantum-agent itself is able to deal with actual ports without relying on nova-compute at all?</div>

<div> A. We can start quantum-agent which would create all the necessary OVS taps in its own way. Then, restart libvirtd which then would start all the VMs with the created OVS taps. This is a good example how to make quantum truly independent of nova-compute

 without using any dependency on external-ids.</div>

<div><br>

</div>

<div>3. Not only all those problems above, it is not desired that nova-compute should have all the code of dealing with OVS specifics (e.g,, all the wrapping functions of ovs-related commands such as ovs-vsctl) although quantum-agent already has all the same

 code of OVS specifics to deal with OVS taps. </div>

<div><br>

</div>

</div>

<div>In summary, all these problems above occur due to the fact that quantum API only manages quantum DB, leaving all the functionality in dealing with actual network objects dispersed across nova-compute (e.g., OVS tap creation) and quantum-agent (e.g., OVS

 flows deployment). </div>

<div><br>

</div>

<div>> nova-compute should call port-update to set binding:host_id </div>

<div><br>

</div>

<div>This could be also a very good use case. If a quantum API really creates an actual port on a host as I have been suggesting here, nova-compute simply gets the return values for the newly created port from that API call. The return values would include

 all the detailed information including host_id, vif_type, etc. And nova-compute can use them to update ports, or maybe create_port() API itself already updates necessary info and simply return the current info such as mapping of binding:host_id. </div>

<div><br>

</div>

<div>I'm not sure how effectively I have been explaining what I meant to say regarding a desirable design between nova-compute and quantum (both quantum-server and quantum-agent). Based on comments I would get from this thread, I may start to write a blueprint

 proposal.</div>

<div><br>

</div>

<div>Please, let me know anything that I missed or misunderstood.</div>

<div><br>

</div>

<div>Thanks,</div>

<div><br>

</div>

<div>-Jun</div>

</div>

<br>

<div class="gmail_quote">On Thu, May 16, 2013 at 1:47 PM, Robert Kukura <span dir="ltr">

<<a href="mailto:rkukura@redhat.com" target="_blank">rkukura@redhat.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>On 05/16/2013 02:40 PM, Mike Wilson wrote:<br>

><br>

><br>

><br>

> On Thu, May 16, 2013 at 12:28 PM, Robert Kukura <<a href="mailto:rkukura@redhat.com" target="_blank">rkukura@redhat.com</a><br>

</div>

<div>

<div>> <mailto:<a href="mailto:rkukura@redhat.com" target="_blank">rkukura@redhat.com</a>>> wrote:<br>

><br>

>     ><br>

>     > @Mike - I think we'd still want to leave nova-compute to create<br>

>     the tap<br>

>     > interfaces and sticking  external-ids on them though.<br>

><br>

>     It also seems nova-compute should call port-update to set<br>

>     binding:host_id and then use the returned binding:vif_type, since the<br>

>     vif_type might vary depending on the host, at least with ml2. The Arista<br>

>     top-of-rack switch hardware driver functionality also depends on the<br>

>     binding:host_id being set.<br>

><br>

>     -Bob<br>

><br>

><br>

> Hmmm, is that really nova-compute's job? Again, that seems to be the<br>

> networking abstraction's job to me. We have all these quantum agents,<br>

> they have the device_id (instance_uuid). Why not have a quantum<br>

> component (agent maybe?) query nova for the host_id and then it calls<br>

> port-update?<br>

<br>

</div>

</div>

I believe this is the final step of an attempt to cleanup the<br>

abstraction between nova and quantum. The idea is to have quantum decide<br>

on the VIF driver, rather than having this knowledge built into the nova<br>

configuration.<br>

<br>

In some cases, quantum will need to know what host the port is being<br>

bound on so it can determine which VIF driver to use (possibly based on<br>

what agent is running on that host). Also, a quantum L2 agent (if there<br>

is one) cannot notice the that the port is binding bound until after the<br>

VIF driver has been selected and done its thing.<br>

<br>

The nova code for this has been in review for a while, but may have<br>

expired. Gerrit is offline at the moment, so I can't search for it.<br>

<br>

-Bob<br>

<br>

><br>

> -Mike<br>

<div>

<div>><br>

><br>

><br>

> _______________________________________________<br>

> OpenStack-dev mailing list<br>

> <a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

><br>

<br>

<br>

_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</div>

</div>

</blockquote>

</div>

<br>

</div>

</div>

</div></div></span>

</div>

</blockquote></div><br>

</div></div><br>_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br></blockquote></div><br></div>

</div></div><br>_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br></blockquote></div><br></div>