Open Stack

Tue Nov 11 17:25:43 UTC 2014

Hi,

There are two main patches that I am interested in back-porting to improve
the performance of the DB queries issued frequently by L2 agents while they
are hosting VMs. These are not one-time queries during specific operations
(e.g. create/delete), they also happen during normal periodic checks from
the L2 agent. Due this constant background behavior, the agents start to
trample the Neutron server once the deployment size scales up and will
eventually exceed its resources so it can no longer service API requests
even though nothing is changing.

The only work-around for this right now is to abnormally scale (compared to
any of the other standard OpenStack services) the Neutron server and the
MySQL nodes to handle the query load. This is really discouraging to
deployers (lots of extra compute power wasted as service nodes) and makes
Neutron appear extremely unstable to deployers who do not know Neutron
needs to be special-cased in this manner.

The first patch is to batch up the ports being requested from an RPC agent
before querying the database.[1] This is an internal-only change (doesn't
affect the data delivered to RCP callers). Before, the server was calling
the DB for each port individually so a query from a high-density port node
like an L3 agent could result in 1000+ DB queries to the database. Now the
service will query the database for all of the port information at once and
then group it by port like the agents expect. This is probably the most
significant improvement when dealing with high-density nodes and there is a
rally performance graph demonstrating this in the comments.

The second patch is to eliminate a join across the Neutron port table that
was a completely unnecessary calculation for the DB to perform and a waste
of data returned (every column from every table in the query).[2] This also
doesn't change the data returned to the caller of the function (no missing
dict entries, etc), so we shouldn't have to worry about out-of-tree
drivers, tools, etc. being broken by this either. I will run the rally
performance numbers for this one as well after the first patch gets merged
since it has a higher impact than this one.

Let me know if I need to elaborate on anything.

1. https://review.openstack.org/#/c/132372/
2. https://review.openstack.org/#/c/130101/

Thanks,
Kevin Benton

On Wed, Oct 29, 2014 at 6:09 AM, Ihar Hrachyshka <ihrachys at redhat.com>
wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 29/10/14 14:00, Dolph Mathews wrote:
> >
> > On Wed, Oct 29, 2014 at 5:23 AM, Ihar Hrachyshka
> > <ihrachys at redhat.com <mailto:ihrachys at redhat.com>> wrote:
> >
> > Hi all,
> >
> > there is a series of Neutron backports in the Juno queue that are
> > intended to significantly improve service performance when
> > handling security groups (one of the issues that are main pain
> > points of current users):
> >
> > - https://review.openstack.org/130101 -
> > https://review.openstack.org/130098 -
> > https://review.openstack.org/130100 -
> > https://review.openstack.org/130097 -
> > https://review.openstack.org/130105
> >
> > The first four patches are optimizing db side (controller), while
> > the last one is to avoid fetching security group rules by OVS agent
> > when firewall is disabled.
> >
> > AFAIK we don't generally backport performance improvements unless
> > they are very significant (though I don't see anything written in
> > stone that says so), but knowing that those patches fix pain
> > hotspots in Neutron, and seem rather isolated, should we consider
> > their inclusion?
> >
> > Should we come up with some "official" rule on how we handle
> > performance enhancement backports?
> >
> >
> >> I'm very much in favor of backporting known performance
> >> improvements, but in my experience, not all "performance
> >> improvements" actually improve performance, so I'd expect an
> >> appropriate benchmark to demonstrate a real performance benefit
> >> to coincide with the proposed patch.
>
> Exactly. That's what I asked to elaborate on at:
> https://review.openstack.org/#/c/130101/
>
> Also, adding Kevin into CC to make sure he is aware of the discussion.
>
> >
> >> For a hypothetical example, what seems like a clear cut
> >> improvement in review 130098 (remove unused columns from a query)
> >> *might* have an unforeseen side effect later on, where another
> >> component doesn't have the data it needs, so it suddenly starts
> >> issuing a new DB query to compensate. OpenStack is certainly
> >> complicated enough that it's impossible to make accurate
> >> assumptions about performance.
> >
> >
> >
> > /Ihar
> >
> > _______________________________________________
> > Openstack-stable-maint mailing list
> > Openstack-stable-maint at lists.openstack.org
> > <mailto:Openstack-stable-maint at lists.openstack.org>
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint
> >
> >
> >
> >
> >
> > _______________________________________________
> > Openstack-stable-maint mailing list
> > Openstack-stable-maint at lists.openstack.org
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
>
> iQEcBAEBCgAGBQJUUObtAAoJEC5aWaUY1u57UYwH/j+wjiydOXjA+lFi3l1Pbl5f
> s7r4Ox6FCPPVoAKziKpygKRbHTrCTew4DcgOxZhmC9qoq+Rk8Q1WFMLlBQ+51Kjj
> lj/72JiPenKvuZSl/E+9FsmWP7ReCCyUMYWiQS6wp6FAd5KpQMMgdjleUQWEAgjN
> Y1M9kYVOmqnYHQy4oWJsV0Od2wFKFAGDKohLEzDocmTQFxcfkEeMSn3qJ4aOwkoz
> KmTFKPGAGU8eTyYNAs3sHa0t9VFwvPoBg4EjMXBjkuoRxz+Nf/IPUZmrruXQ7LM6
> ioXEUH3GdKQSCKWtYoFFI1QPpiTQSIalO6nURxUg0UldW6i5QwIX1LTz8GMG+TQ=
> =JJq0
> -----END PGP SIGNATURE-----
>

-- 
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-stable-maint/attachments/20141111/27fc3df0/attachment.html>

Open Stack

[Openstack-stable-maint] Neutron backports for security group performance

OpenStack

Community

Documentation

Branding & Legal