Re: [neutron] api performance at scale

3 Dec 2019

      On 12/3/2019 11:24 AM, Erik Olof Gunnar Andersson wrote:
...
For us nova used to be the biggest concern, but a lot of work has been 
done and nova now performers great. Instead we are having issues to get 
Neutron to perform at scale. Obvious calls like security groups are 
performing really poorly, and nova-compute defaults for refreshing the 
network cache on computes causes massive issues with Neutron.
I wonder how much of the performance hit is due to rootwrap usage in 
neutron (nova's conversion to privsep was completed in Train).

Nova might be the bees knees, but I know there are things in nova we 
could do to be smarter about not hammering the neutron API as much, e.g.:

https://review.opendev.org/#/c/465792/ - make bulk queries to neutron 
when refreshing the instance network info cache

https://review.opendev.org/#/q/I7de14456d04370c842b4c35597dca3a628a826a2 
- be smarter about filtering to avoid expensive joins

https://bugs.launchpad.net/nova/+bug/1567655 - nova's internal network 
info cache only stores information about ports and their related 
networks/subnets/ips but the security group information related to the 
ports attached to a server is fetched directly anytime it's needed, 
including when listing servers with details. So if you're an admin 
listing all servers across all tenants, that could get pretty slow. I've 
long thought we should cache the security group information like we do 
for ports for read-only operations like GET /servers/detail but it's a 
non-trivial amount of work to make that happen and we'd definitely want 
benchmarks and such to justify the change.

Note ttx has started a large ops SIG or whatever so this is probably 
something to discuss there:

https://wiki.openstack.org/wiki/Large_Scale_SIG

-- 

Thanks,

Matt

Re: [neutron] api performance at scale

Matt Riedemann