[neutron] api performance at scale
Erik Olof Gunnar Andersson
eandersson at blizzard.com
Wed Dec 4 05:42:26 UTC 2019
Yea - I think those patches would help a lot. Especially the security group related change. Security groups for some reason are the most expensive call in Neutron for us. In our larger deployments the simplest security group list commands takes 60 seconds to perform. We had very similar issues with neutron-lbaas, but those calls has since been fixed.
The large ops SIG is unfortunately 1AM (Pacific Time) over here. I can try to attend it, but wouldn't be easy.
From: Matt Riedemann <mriedemos at gmail.com>
Sent: Tuesday, December 3, 2019 9:44 AM
To: openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: Re: [neutron] api performance at scale
On 12/3/2019 11:24 AM, Erik Olof Gunnar Andersson wrote:
> For us nova used to be the biggest concern, but a lot of work has been
> done and nova now performers great. Instead we are having issues to get
> Neutron to perform at scale. Obvious calls like security groups are
> performing really poorly, and nova-compute defaults for refreshing the
> network cache on computes causes massive issues with Neutron.
I wonder how much of the performance hit is due to rootwrap usage in
neutron (nova's conversion to privsep was completed in Train).
Nova might be the bees knees, but I know there are things in nova we
could do to be smarter about not hammering the neutron API as much, e.g.:
https://urldefense.com/v3/__https://review.opendev.org/*/c/465792/__;Iw!2E0gRdhhnqPNNL0!0A97ZwnFJg3RpxtEwi5sVytDBtU_R8YdJ5P9h8OAUX7ciGEtHKVExLKEUIssCQ7svA$ - make bulk queries to neutron
when refreshing the instance network info cache
- be smarter about filtering to avoid expensive joins
https://urldefense.com/v3/__https://bugs.launchpad.net/nova/*bug/1567655__;Kw!2E0gRdhhnqPNNL0!0A97ZwnFJg3RpxtEwi5sVytDBtU_R8YdJ5P9h8OAUX7ciGEtHKVExLKEUIubIZxDAg$ - nova's internal network
info cache only stores information about ports and their related
networks/subnets/ips but the security group information related to the
ports attached to a server is fetched directly anytime it's needed,
including when listing servers with details. So if you're an admin
listing all servers across all tenants, that could get pretty slow. I've
long thought we should cache the security group information like we do
for ports for read-only operations like GET /servers/detail but it's a
non-trivial amount of work to make that happen and we'd definitely want
benchmarks and such to justify the change.
Note ttx has started a large ops SIG or whatever so this is probably
something to discuss there:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openstack-discuss