[openstack-dev] [neutron] Some findings while profiling instances boot

Kevin Benton kevin at benton.pub
Thu Feb 16 22:14:29 UTC 2017


We could potentially make that call async on the agent, but the agent has
very little to do without the information in the response that comes back.

As we switch over to push notifications, this method of data retrieval will
be completely gone so we probably don't want to spend much time redesigning
that workflow anyway.

The baked queries thing is interesting, I'll reply on Mike's email with
more details.

On Feb 16, 2017 7:07 AM, "Daniel Alvarez Sanchez" <dalvarez at redhat.com>
wrote:

> Awesome work, Kevin!
>
> For the DHCP notification, in my profiling I got only 10% of the CPU time
> [0] without taking the waiting times into account which it's probably what
> you also measured.
> Your patch seems like a neat and great optimization :)
>
> Also, since "get_devices_details_list_and_failed_devices()" takes quite a
> long time, does it make sense to trigger this request asynchronously (same
> approach you took for OVO notifier) and continue executing the iteration?
> This would not result in a huge improvement but, in the case I showed in
> the diagram, both 'get_device_details' can be issued at the same time
> instead of one after another and, probably, freeing the iteration for
> further processing on the agent side. Thoughts on this?
>
> Regarding, the time of SQL queries, it looks like the server spends a
> significant amount of time building those and reducing that time will
> result in a nice improvement. Mike's outstanding analysis looks promising
> and maybe it's worth to discuss it.
>
> [0] http://imgur.com/lDikZ0I
>
>
>
> On Thu, Feb 16, 2017 at 8:23 AM, Kevin Benton <kevin at benton.pub> wrote:
>
>> Thanks for the stats and the nice diagram. I did some profiling and I'm
>> sure it's the RPC handler on the Neutron server-side behaving like garbage.
>>
>> There are several causes that I have a string of patches up to address
>> that mainly stem from the fact that l2pop requires multiple port status
>> updates to function correctly:
>>
>> * The DHCP notifier will trigger a notification to the DHCP agents on the
>> network on a port status update. This wouldn't be too problematic on it's
>> own, but it does several queries for networks and segments to determine
>> which agents it should talk to. Patch to address it here:
>> https://review.openstack.org/#/c/434677/
>>
>> * The OVO notifier will also generate a notification on any port data
>> model change, including the status. This is ultimately the desired
>> behavior, but until we eliminate the frivolous status flipping, it's going
>> to incur a performance hit. Patch here to put it asynced into the
>> background so it doesn't block the port update process:
>> https://review.openstack.org/#/c/434678/
>>
>> * A wasteful DB query in the ML2 PortContext: https://review.op
>> enstack.org/#/c/434679/
>>
>> * More unnecessary  queries for the status update case in the ML2
>> PortContext: https://review.openstack.org/#/c/434680/
>>
>> * Bulking up the DB queries rather than retrieving port details one by
>> one.
>> https://review.openstack.org/#/c/434681/ https://review.open
>> stack.org/#/c/434682/
>>
>> The top two accounted for more than 60% of the overhead in my profiling
>> and they are pretty simple, so we may be able to get them into Ocata for RC
>> depending on how other cores feel. If not, they should be good candidates
>> for back-porting later. Some of the others start to get more invasive so we
>> may be stuck.
>>
>> Cheers,
>> Kevin Benton
>>
>> On Wed, Feb 15, 2017 at 12:25 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>>
>>> On 02/15/2017 12:46 PM, Daniel Alvarez Sanchez wrote:
>>>
>>>> Hi there,
>>>>
>>>> We're trying to figure out why, sometimes, rpc_loop takes over 10
>>>> seconds to process an iteration when booting instances. So we deployed
>>>> devstack on a 8GB, 4vCPU VM and did some profiling on the following
>>>> command:
>>>>
>>>> nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec --nic
>>>> net-name=private --min-count 8 instance
>>>>
>>>
>>> Hi Daniel, thanks for posting the information here. Quick request of
>>> you, though... can you try re-running the test but doing 8 separate calls
>>> to nova boot instead of using the --min-count 8 parameter? I'm curious to
>>> see if you notice any difference in contention/performance.
>>>
>>> Best,
>>> -jay
>>>
>>> ____________________________________________________________
>>> ______________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.op
>>> enstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170216/cbfce7eb/attachment-0001.html>


More information about the OpenStack-dev mailing list