[openstack-dev] [neutron] Some findings while profiling instances boot
Daniel Alvarez Sanchez
dalvarez at redhat.com
Thu Feb 16 15:04:15 UTC 2017
Awesome work, Kevin!
For the DHCP notification, in my profiling I got only 10% of the CPU time
 without taking the waiting times into account which it's probably what
you also measured.
Your patch seems like a neat and great optimization :)
Also, since "get_devices_details_list_and_failed_devices()" takes quite a
long time, does it make sense to trigger this request asynchronously (same
approach you took for OVO notifier) and continue executing the iteration?
This would not result in a huge improvement but, in the case I showed in
the diagram, both 'get_device_details' can be issued at the same time
instead of one after another and, probably, freeing the iteration for
further processing on the agent side. Thoughts on this?
Regarding, the time of SQL queries, it looks like the server spends a
significant amount of time building those and reducing that time will
result in a nice improvement. Mike's outstanding analysis looks promising
and maybe it's worth to discuss it.
On Thu, Feb 16, 2017 at 8:23 AM, Kevin Benton <kevin at benton.pub> wrote:
> Thanks for the stats and the nice diagram. I did some profiling and I'm
> sure it's the RPC handler on the Neutron server-side behaving like garbage.
> There are several causes that I have a string of patches up to address
> that mainly stem from the fact that l2pop requires multiple port status
> updates to function correctly:
> * The DHCP notifier will trigger a notification to the DHCP agents on the
> network on a port status update. This wouldn't be too problematic on it's
> own, but it does several queries for networks and segments to determine
> which agents it should talk to. Patch to address it here:
> * The OVO notifier will also generate a notification on any port data
> model change, including the status. This is ultimately the desired
> behavior, but until we eliminate the frivolous status flipping, it's going
> to incur a performance hit. Patch here to put it asynced into the
> background so it doesn't block the port update process:
> * A wasteful DB query in the ML2 PortContext: https://review.op
> * More unnecessary queries for the status update case in the ML2
> PortContext: https://review.openstack.org/#/c/434680/
> * Bulking up the DB queries rather than retrieving port details one by
> https://review.openstack.org/#/c/434681/ https://review.open
> The top two accounted for more than 60% of the overhead in my profiling
> and they are pretty simple, so we may be able to get them into Ocata for RC
> depending on how other cores feel. If not, they should be good candidates
> for back-porting later. Some of the others start to get more invasive so we
> may be stuck.
> Kevin Benton
> On Wed, Feb 15, 2017 at 12:25 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>> On 02/15/2017 12:46 PM, Daniel Alvarez Sanchez wrote:
>>> Hi there,
>>> We're trying to figure out why, sometimes, rpc_loop takes over 10
>>> seconds to process an iteration when booting instances. So we deployed
>>> devstack on a 8GB, 4vCPU VM and did some profiling on the following
>>> nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec --nic
>>> net-name=private --min-count 8 instance
>> Hi Daniel, thanks for posting the information here. Quick request of you,
>> though... can you try re-running the test but doing 8 separate calls to
>> nova boot instead of using the --min-count 8 parameter? I'm curious to see
>> if you notice any difference in contention/performance.
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev