<div dir="auto">We could potentially make that call async on the agent, but the agent has very little to do without the information in the response that comes back. <div dir="auto"><br></div><div dir="auto">As we switch over to push notifications, this method of data retrieval will be completely gone so we probably don't want to spend much time redesigning that workflow anyway. <br></div><div dir="auto"><br></div><div dir="auto">The baked queries thing is interesting, I'll reply on Mike's email with more details. </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Feb 16, 2017 7:07 AM, "Daniel Alvarez Sanchez" <<a href="mailto:dalvarez@redhat.com">dalvarez@redhat.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Awesome work, Kevin! <div><br></div><div>For the DHCP notification, in my profiling I got only 10% of the CPU time [0] without taking the waiting times into account which it's probably what you also measured. </div><div>Your patch seems like a neat and great optimization :)</div><div><br></div><div>Also, since "get_devices_details_list_and_<wbr>failed_devices()" takes quite a long time, does it make sense to trigger this request asynchronously (same approach you took for OVO notifier) and continue executing the iteration? This would not result in a huge improvement but, in the case I showed in the diagram, both 'get_device_details' can be issued at the same time instead of one after another and, probably, freeing the iteration for further processing on the agent side. Thoughts on this?</div><div><br></div><div>Regarding, the time of SQL queries, it looks like the server spends a significant amount of time building those and reducing that time will result in a nice improvement. Mike's outstanding analysis looks promising and maybe it's worth to discuss it.<br></div><div><br></div><div>[0] <a href="http://imgur.com/lDikZ0I" target="_blank">http://imgur.com/lDikZ0I</a></div><div><br></div><div><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 16, 2017 at 8:23 AM, Kevin Benton <span dir="ltr"><<a href="mailto:kevin@benton.pub" target="_blank">kevin@benton.pub</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks for the stats and the nice diagram. I did some profiling and I'm sure it's the RPC handler on the Neutron server-side behaving like garbage.<div><br></div><div>There are several causes that I have a string of patches up to address that mainly stem from the fact that l2pop requires multiple port status updates to function correctly:</div><div><br></div><div>* The DHCP notifier will trigger a notification to the DHCP agents on the network on a port status update. This wouldn't be too problematic on it's own, but it does several queries for networks and segments to determine which agents it should talk to. Patch to address it here: <a href="https://review.openstack.org/#/c/434677/" target="_blank">https://review.openstack<wbr>.org/#/c/434677/</a></div><div><br></div><div>* The OVO notifier will also generate a notification on any port data model change, including the status. This is ultimately the desired behavior, but until we eliminate the frivolous status flipping, it's going to incur a performance hit. Patch here to put it asynced into the background so it doesn't block the port update process: <a href="https://review.openstack.org/#/c/434678/" target="_blank">https://review.openst<wbr>ack.org/#/c/434678/</a></div><div><br></div><div>* A wasteful DB query in the ML2 PortContext: <a href="https://review.openstack.org/#/c/434679/" target="_blank">https://review.op<wbr>enstack.org/#/c/434679/</a></div><div><br></div><div>* More unnecessary queries for the status update case in the ML2 PortContext: <a href="https://review.openstack.org/#/c/434680/" target="_blank">https://review.op<wbr>enstack.org/#/c/434680/</a></div><div><br></div><div>* Bulking up the DB queries rather than retrieving port details one by one. </div><div><a href="https://review.openstack.org/#/c/434681/" target="_blank">https://review.openstack.org/#<wbr>/c/434681/</a> <a href="https://review.openstack.org/#/c/434682/" target="_blank">https://review.open<wbr>stack.org/#/c/434682/</a></div><div><br></div><div>The top two accounted for more than 60% of the overhead in my profiling and they are pretty simple, so we may be able to get them into Ocata for RC depending on how other cores feel. If not, they should be good candidates for back-porting later. Some of the others start to get more invasive so we may be stuck.</div><div><br></div><div>Cheers,</div><div>Kevin Benton</div></div><div class="m_6800696559711089274gmail-m_285883221720072119m_29347367387697718gmail-HOEnZb"><div class="m_6800696559711089274gmail-m_285883221720072119m_29347367387697718gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 15, 2017 at 12:25 PM, Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span>On 02/15/2017 12:46 PM, Daniel Alvarez Sanchez wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Hi there,<br>
<br>
We're trying to figure out why, sometimes, rpc_loop takes over 10<br>
seconds to process an iteration when booting instances. So we deployed<br>
devstack on a 8GB, 4vCPU VM and did some profiling on the following command:<br>
<br>
nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec --nic<br>
net-name=private --min-count 8 instance<br>
</blockquote>
<br></span>
Hi Daniel, thanks for posting the information here. Quick request of you, though... can you try re-running the test but doing 8 separate calls to nova boot instead of using the --min-count 8 parameter? I'm curious to see if you notice any difference in contention/performance.<br>
<br>
Best,<br>
-jay<br>
<br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>
</blockquote></div><br></div>
</div></div><br>______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>
<br></blockquote></div><br></div></div></div>
<br>______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>
<br></blockquote></div></div>