[openstack-dev] [stable][neutron] Fwd: Re: [Openstack-stable-maint] Neutron backports for security group performance

Ihar Hrachyshka ihrachys at redhat.com
Tue Nov 11 19:39:08 UTC 2014

Hash: SHA512

Forwarding to openstack-dev since openstack-stable-maint is now read-only.

- -------- Forwarded Message --------
Subject: 	Re: [Openstack-stable-maint] Neutron backports for security
group performance
Date: 	Tue, 11 Nov 2014 09:25:43 -0800
From: 	Kevin Benton <blak111 at gmail.com>
To: 	openstack-stable-maint at lists.openstack.org
CC: 	Ihar Hrachyshka <ihrachys at redhat.com>


There are two main patches that I am interested in back-porting to
improve the performance of the DB queries issued frequently by L2 agents
while they are hosting VMs. These are not one-time queries during
specific operations (e.g. create/delete), they also happen during normal
periodic checks from the L2 agent. Due this constant background
behavior, the agents start to trample the Neutron server once the
deployment size scales up and will eventually exceed its resources so it
can no longer service API requests even though nothing is changing.

The only work-around for this right now is to abnormally scale (compared
to any of the other standard OpenStack services) the Neutron server and
the MySQL nodes to handle the query load. This is really discouraging to
deployers (lots of extra compute power wasted as service nodes) and
makes Neutron appear extremely unstable to deployers who do not know
Neutron needs to be special-cased in this manner.

The first patch is to batch up the ports being requested from an RPC
agent before querying the database.[1] This is an internal-only change
(doesn't affect the data delivered to RCP callers). Before, the server
was calling the DB for each port individually so a query from a
high-density port node like an L3 agent could result in 1000+ DB queries
to the database. Now the service will query the database for all of the
port information at once and then group it by port like the agents
expect. This is probably the most significant improvement when dealing
with high-density nodes and there is a rally performance graph
demonstrating this in the comments.

The second patch is to eliminate a join across the Neutron port table
that was a completely unnecessary calculation for the DB to perform and
a waste of data returned (every column from every table in the
query).[2] This also doesn't change the data returned to the caller of
the function (no missing dict entries, etc), so we shouldn't have to
worry about out-of-tree drivers, tools, etc. being broken by this
either. I will run the rally performance numbers for this one as well
after the first patch gets merged since it has a higher impact than this

Let me know if I need to elaborate on anything.

1. https://review.openstack.org/#/c/132372/
2. https://review.openstack.org/#/c/130101/

Kevin Benton

On Wed, Oct 29, 2014 at 6:09 AM, Ihar Hrachyshka <ihrachys at redhat.com
<mailto:ihrachys at redhat.com>> wrote:

On 29/10/14 14:00, Dolph Mathews wrote:

> On Wed, Oct 29, 2014 at 5:23 AM, Ihar Hrachyshka 
> <ihrachys at redhat.com <mailto:ihrachys at redhat.com>
<mailto:ihrachys at redhat.com <mailto:ihrachys at redhat.com>>> wrote:

> Hi all,

> there is a series of Neutron backports in the Juno queue that are 
> intended to significantly improve service performance when handling
> security groups (one of the issues that are main pain points of
> current users):

> - https://review.openstack.org/130101 - 
> https://review.openstack.org/130098 - 
> https://review.openstack.org/130100 - 
> https://review.openstack.org/130097 - 
> https://review.openstack.org/130105

> The first four patches are optimizing db side (controller), while 
> the last one is to avoid fetching security group rules by OVS
> agent when firewall is disabled.

> AFAIK we don't generally backport performance improvements unless 
> they are very significant (though I don't see anything written in 
> stone that says so), but knowing that those patches fix pain 
> hotspots in Neutron, and seem rather isolated, should we consider 
> their inclusion?

> Should we come up with some "official" rule on how we handle 
> performance enhancement backports?

>> I'm very much in favor of backporting known performance 
>> improvements, but in my experience, not all "performance 
>> improvements" actually improve performance, so I'd expect an 
>> appropriate benchmark to demonstrate a real performance benefit 
>> to coincide with the proposed patch.

Exactly. That's what I asked to elaborate on at:

Also, adding Kevin into CC to make sure he is aware of the discussion.

>> For a hypothetical example, what seems like a clear cut 
>> improvement in review 130098 (remove unused columns from a
>> query) *might* have an unforeseen side effect later on, where
>> another component doesn't have the data it needs, so it suddenly
>> starts issuing a new DB query to compensate. OpenStack is
>> certainly complicated enough that it's impossible to make
>> accurate assumptions about performance.

> /Ihar

> _______________________________________________ 
> Openstack-stable-maint mailing list 
> Openstack-stable-maint at lists.openstack.org
<mailto:Openstack-stable-maint at lists.openstack.org>
> <mailto:Openstack-stable-maint at lists.openstack.org
<mailto:Openstack-stable-maint at lists.openstack.org>>


> _______________________________________________ 
> Openstack-stable-maint mailing list 
> Openstack-stable-maint at lists.openstack.org
<mailto:Openstack-stable-maint at lists.openstack.org>


- -- 
Kevin Benton

Version: GnuPG/MacGPG2 v2.0.22 (Darwin)


More information about the OpenStack-dev mailing list