Neutron backports for security group performance

newer
Stable check of openstack/cinder...

older
Stable check of openstack/glance...

Ihar Hrachyshka

29 Oct 2014 29 Oct '14

3:23 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hi all, there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users): - - https://review.openstack.org/130101 - - https://review.openstack.org/130098 - - https://review.openstack.org/130100 - - https://review.openstack.org/130097 - - https://review.openstack.org/130105 The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled. AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion? Should we come up with some "official" rule on how we handle performance enhancement backports? /Ihar -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) iQEcBAEBCgAGBQJUUMAvAAoJEC5aWaUY1u57ZkMIAJmae7GdQ3rhRZVpUZkCGAUK 7i2qPqjVh0Qu++kgcMmbM6YPnT4p//OuOAiU9ak8l46TWdeqw9cC0vsGO4Es4MKC rX8pAT/KBgX4FPzTGxhHBk8g5XpD9i6SutGfdFBmoFwj0eV8BAxNTD2A+hmM2ZHO QLBAcNFYhh/9QSnfpdx885z6M+iQ8n91oo1lqugZEdtmpNdrY2nW0ovFHTfj/9ku qznykok80JBNl1KO15Aaru3aHJUoj8/C8ek+UzLN0VP0W+H2zJQJVbGBny1BIVYm odvijGbxvq2rN90HbtUUqNwcM6Mfbc76fDT/agJo4hIDxXfvzsQpKY8iegiEiOc= =bBLb -----END PGP SIGNATURE-----

Show replies by date

Miguel Angel Ajo Pelayo

29 Oct 29 Oct

3:39 a.m.

New subject: Neutron backports for security group performance

I'm my opinion, the security-group related ones are probably the most importants, as that specific part of neutron get's a lot of calls from all L2 agents in compute nodes. In the case of backporting, I'd, at least, let some time to get those changes cured, and make sure they are not introducing any new failure mode. Best regards, Miguel Ángel ----- Original Message -----

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512

Hi all,

there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users):

- - https://review.openstack.org/130101 - - https://review.openstack.org/130098 - - https://review.openstack.org/130100 - - https://review.openstack.org/130097 - - https://review.openstack.org/130105

The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled.

AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion?

Should we come up with some "official" rule on how we handle performance enhancement backports?

/Ihar -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQEcBAEBCgAGBQJUUMAvAAoJEC5aWaUY1u57ZkMIAJmae7GdQ3rhRZVpUZkCGAUK 7i2qPqjVh0Qu++kgcMmbM6YPnT4p//OuOAiU9ak8l46TWdeqw9cC0vsGO4Es4MKC rX8pAT/KBgX4FPzTGxhHBk8g5XpD9i6SutGfdFBmoFwj0eV8BAxNTD2A+hmM2ZHO QLBAcNFYhh/9QSnfpdx885z6M+iQ8n91oo1lqugZEdtmpNdrY2nW0ovFHTfj/9ku qznykok80JBNl1KO15Aaru3aHJUoj8/C8ek+UzLN0VP0W+H2zJQJVbGBny1BIVYm odvijGbxvq2rN90HbtUUqNwcM6Mfbc76fDT/agJo4hIDxXfvzsQpKY8iegiEiOc= =bBLb -----END PGP SIGNATURE-----

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

Ihar Hrachyshka

3:59 a.m.

New subject: Neutron backports for security group performance

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 29/10/14 11:39, Miguel Angel Ajo Pelayo wrote:

...

In the case of backporting, I'd, at least, let some time to get those changes cured, and make sure they are not introducing any new failure mode.

Cured? What do you mean? Giving them some time to sit in master before proceeding with backports? /Ihar -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) iQEcBAEBCgAGBQJUUMiBAAoJEC5aWaUY1u57FVQH+QH8gSgkvr1r8Tj9+EBONf+O n5ht7BIkd0P4bMLTWEZ/s3CX8z5EqhuBr6DcOyhQEq+Sf+AOQgkcy360DSPlRyGu FzwHt3v2DARCC6sZ6xSryZiyZcrYjucUP70ZOSSLhDR25wdKsNF/RTWBanryFLQD X7rABJdgSQN2e0uF7gshApMwzzC6ypRt6TimXHO1CBiK+MvUfEJN2NuS8TGx4+F3 8O+W7cwdn9RxNoqMP30qZGcUzfJTqnRse05JSmj9xgWOI1QUVA3Ql3LXfh+SDBe1 NyxKGk1DmvoWNhVw6F8T1gfghXt/yg5xCbbqCindZawwcFNjgxniFQDpVe8OOoM= =QwWs -----END PGP SIGNATURE-----

Claudiu Belu

4:29 a.m.

New subject: Neutron backports for security group performance

Hello, The security groups really needs a performance boost, since it becomes troublesome in large deployments. Let's say we have an OpenStack deployment of 10,000 vms. Each time we modify a security group rule, the L2 agents will be notified that the security group has changed. Some of them (I don't know if all of them) will do a full refresh. This would mean 10,000 refreshes each time a security group changes. Now, the question is, how often will the security groups change? Well, as far as I know, a security group rule is created whenever a new port is created and is bound to a network (for example., nova boot --nic net-id=...). Also, when a vm is deleted, the security group rule will be removed as well. I also assume that the L2 agent will refresh the security group rules in case of migration and resize. Also, there are some agents (Hyper-V L2 agent) that will refresh all the ports it manages when they start. Also, a vm can have multiple security groups, so when an agent will do a refresh, it will refresh for each security group. Also, a vm can have multiple nics, which can mean more ports and more rules. There might be other common scenarios I didn't mention, but my point is that any performance boost, even a small one, for security groups will have a big impact. I am in favor of this, but still, I think we will have to make sure it introduce new issues for some L2 agents, since they are the main consumers of the security groups. The commits will have to be validated by all the CIs. Best regards, Claudiu Belu ________________________________________ From: Ihar Hrachyshka [ihrachys@redhat.com] Sent: Wednesday, October 29, 2014 12:59 PM To: Miguel Angel Ajo Pelayo Cc: openstack-stable-maint Subject: Re: [Openstack-stable-maint] Neutron backports for security group performance -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 29/10/14 11:39, Miguel Angel Ajo Pelayo wrote:

...

In the case of backporting, I'd, at least, let some time to get those changes cured, and make sure they are not introducing any new failure mode.

Miguel Angel Ajo Pelayo

5:14 a.m.

New subject: Neutron backports for security group performance

+1, With cured I mean sitting on master to make sure they don't introduce any new issue. I like Claudiu's definition of the problem, it's actually very descriptive. Claudiu, I believe all plugins gerrit/CI on stable/juno may be enough to validate the backports, am I right?. Otherwise the process could go too complicated (manual backport D/S for every CI, and specific testing... that actually may happen before the next D/S release based on juno). ----- Original Message -----

...

Hello,

The security groups really needs a performance boost, since it becomes troublesome in large deployments.

Let's say we have an OpenStack deployment of 10,000 vms. Each time we modify a security group rule, the L2 agents will be notified that the security group has changed. Some of them (I don't know if all of them) will do a full refresh. This would mean 10,000 refreshes each time a security group changes.

Now, the question is, how often will the security groups change? Well, as far as I know, a security group rule is created whenever a new port is created and is bound to a network (for example., nova boot --nic net-id=...). Also, when a vm is deleted, the security group rule will be removed as well. I also assume that the L2 agent will refresh the security group rules in case of migration and resize.

Also, there are some agents (Hyper-V L2 agent) that will refresh all the ports it manages when they start. Also, a vm can have multiple security groups, so when an agent will do a refresh, it will refresh for each security group. Also, a vm can have multiple nics, which can mean more ports and more rules.

There might be other common scenarios I didn't mention, but my point is that any performance boost, even a small one, for security groups will have a big impact.

I am in favor of this, but still, I think we will have to make sure it introduce new issues for some L2 agents, since they are the main consumers of the security groups. The commits will have to be validated by all the CIs.

Best regards, Claudiu Belu ________________________________________ From: Ihar Hrachyshka [ihrachys@redhat.com] Sent: Wednesday, October 29, 2014 12:59 PM To: Miguel Angel Ajo Pelayo Cc: openstack-stable-maint Subject: Re: [Openstack-stable-maint] Neutron backports for security group performance

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512

On 29/10/14 11:39, Miguel Angel Ajo Pelayo wrote:

...
In the case of backporting, I'd, at least, let some time to get those changes cured, and make sure they are not introducing any new failure mode.

Cured? What do you mean? Giving them some time to sit in master before proceeding with backports?

/Ihar -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQEcBAEBCgAGBQJUUMiBAAoJEC5aWaUY1u57FVQH+QH8gSgkvr1r8Tj9+EBONf+O n5ht7BIkd0P4bMLTWEZ/s3CX8z5EqhuBr6DcOyhQEq+Sf+AOQgkcy360DSPlRyGu FzwHt3v2DARCC6sZ6xSryZiyZcrYjucUP70ZOSSLhDR25wdKsNF/RTWBanryFLQD X7rABJdgSQN2e0uF7gshApMwzzC6ypRt6TimXHO1CBiK+MvUfEJN2NuS8TGx4+F3 8O+W7cwdn9RxNoqMP30qZGcUzfJTqnRse05JSmj9xgWOI1QUVA3Ql3LXfh+SDBe1 NyxKGk1DmvoWNhVw6F8T1gfghXt/yg5xCbbqCindZawwcFNjgxniFQDpVe8OOoM= =QwWs -----END PGP SIGNATURE-----

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

Thierry Carrez

5:52 a.m.

New subject: Neutron backports for security group performance

Ihar Hrachyshka wrote:

...

Hi all,

there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users):

- https://review.openstack.org/130101 - https://review.openstack.org/130098 - https://review.openstack.org/130100 - https://review.openstack.org/130097 - https://review.openstack.org/130105

The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled.

AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion?

Should we come up with some "official" rule on how we handle performance enhancement backports?

I personally think performance enhancements are off the table for stable backports. It's really a trade-off between exposing our users to regressions and fixing bugs, and performance enhancements IMHO fall below the bar. I'm adding that topic to the stable branch discussion at the summit, though. -- Thierry Carrez (ttx)

Dolph Mathews

6:03 a.m.

On Wed, Oct 29, 2014 at 7:52 AM, Thierry Carrez <thierry@openstack.org> wrote:

...

Ihar Hrachyshka wrote:

...
Hi all,

there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users):

- https://review.openstack.org/130101 - https://review.openstack.org/130098 - https://review.openstack.org/130100 - https://review.openstack.org/130097 - https://review.openstack.org/130105

The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled.

AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion?

Should we come up with some "official" rule on how we handle performance enhancement backports?

I personally think performance enhancements are off the table for stable backports. It's really a trade-off between exposing our users to regressions and fixing bugs, and performance enhancements IMHO fall below the bar.

Would you not agree that there is some threshold at which bad performance becomes a blocking issue, just like any other bug? I think the problem is in determining where that threshold lies.

...

I'm adding that topic to the stable branch discussion at the summit, though.

-- Thierry Carrez (ttx)

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

Thierry Carrez

9:12 a.m.

New subject: Neutron backports for security group performance

Dolph Mathews wrote:

...

...
I personally think performance enhancements are off the table for stable backports. It's really a trade-off between exposing our users to regressions and fixing bugs, and performance enhancements IMHO fall below the bar.

Would you not agree that there is some threshold at which bad performance becomes a blocking issue, just like any other bug? I think the problem is in determining where that threshold lies.

Right, sometimes a performance issue becomes a bug because it renders the system unusable. But in most cases it's an improvement, which is considered a feature. -- Thierry Carrez (ttx)

Dolph Mathews

6 a.m.

On Wed, Oct 29, 2014 at 5:23 AM, Ihar Hrachyshka <ihrachys@redhat.com> wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512

Hi all,

there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users):

- - https://review.openstack.org/130101 - - https://review.openstack.org/130098 - - https://review.openstack.org/130100 - - https://review.openstack.org/130097 - - https://review.openstack.org/130105

The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled.

AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion?

Should we come up with some "official" rule on how we handle performance enhancement backports?

I'm very much in favor of backporting known performance improvements, but in my experience, not all "performance improvements" actually improve performance, so I'd expect an appropriate benchmark to demonstrate a real performance benefit to coincide with the proposed patch. For a hypothetical example, what seems like a clear cut improvement in review 130098 (remove unused columns from a query) *might* have an unforeseen side effect later on, where another component doesn't have the data it needs, so it suddenly starts issuing a new DB query to compensate. OpenStack is certainly complicated enough that it's impossible to make accurate assumptions about performance.

...

/Ihar -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQEcBAEBCgAGBQJUUMAvAAoJEC5aWaUY1u57ZkMIAJmae7GdQ3rhRZVpUZkCGAUK 7i2qPqjVh0Qu++kgcMmbM6YPnT4p//OuOAiU9ak8l46TWdeqw9cC0vsGO4Es4MKC rX8pAT/KBgX4FPzTGxhHBk8g5XpD9i6SutGfdFBmoFwj0eV8BAxNTD2A+hmM2ZHO QLBAcNFYhh/9QSnfpdx885z6M+iQ8n91oo1lqugZEdtmpNdrY2nW0ovFHTfj/9ku qznykok80JBNl1KO15Aaru3aHJUoj8/C8ek+UzLN0VP0W+H2zJQJVbGBny1BIVYm odvijGbxvq2rN90HbtUUqNwcM6Mfbc76fDT/agJo4hIDxXfvzsQpKY8iegiEiOc= =bBLb -----END PGP SIGNATURE-----

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

Ihar Hrachyshka

6:09 a.m.

New subject: Neutron backports for security group performance

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 29/10/14 14:00, Dolph Mathews wrote:

...

On Wed, Oct 29, 2014 at 5:23 AM, Ihar Hrachyshka <ihrachys@redhat.com <mailto:ihrachys@redhat.com>> wrote:

Hi all,

there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users):

- https://review.openstack.org/130101 - https://review.openstack.org/130098 - https://review.openstack.org/130100 - https://review.openstack.org/130097 - https://review.openstack.org/130105

The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled.

AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion?

Should we come up with some "official" rule on how we handle performance enhancement backports?

...
I'm very much in favor of backporting known performance improvements, but in my experience, not all "performance improvements" actually improve performance, so I'd expect an appropriate benchmark to demonstrate a real performance benefit to coincide with the proposed patch.

Exactly. That's what I asked to elaborate on at: https://review.openstack.org/#/c/130101/ Also, adding Kevin into CC to make sure he is aware of the discussion.

...

...
For a hypothetical example, what seems like a clear cut improvement in review 130098 (remove unused columns from a query) *might* have an unforeseen side effect later on, where another component doesn't have the data it needs, so it suddenly starts issuing a new DB query to compensate. OpenStack is certainly complicated enough that it's impossible to make accurate assumptions about performance.

/Ihar

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org <mailto:Openstack-stable-maint@lists.openstack.org> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

Claudiu Belu

6:24 a.m.

New subject: Neutron backports for security group performance

1. Miguel: Ideally, yes, it should be enough to validate the backports. But most of the CIs will run a subset of the tempest tests that may or may not include tempest.scenario.test_security_groups_basic_ops, which tests the functionality of the security groups on the compute nodes. But this would be a problem on master too.. 2. Dolph: good point, a benchmark is appropriate in this scenario. It will help us decide whether the backports are worth the risk they have or not. I also agree with the fact that low enough performance is a kind of bug. 3. Thierry: I agree, those backports have a risk of regression, but low performance is a problem too. We should at least see the benchmark results and then decide whether the gain significantly outweighs the risk or not. :) ________________________________________ From: Ihar Hrachyshka [ihrachys@redhat.com] Sent: Wednesday, October 29, 2014 3:09 PM To: openstack-stable-maint@lists.openstack.org Cc: kevinbenton@buttewifi.com Subject: Re: [Openstack-stable-maint] Neutron backports for security group performance -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 29/10/14 14:00, Dolph Mathews wrote:

...

On Wed, Oct 29, 2014 at 5:23 AM, Ihar Hrachyshka <ihrachys@redhat.com <mailto:ihrachys@redhat.com>> wrote:

Hi all,

there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users):

- https://review.openstack.org/130101 - https://review.openstack.org/130098 - https://review.openstack.org/130100 - https://review.openstack.org/130097 - https://review.openstack.org/130105

The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled.

AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion?

Should we come up with some "official" rule on how we handle performance enhancement backports?

...
I'm very much in favor of backporting known performance improvements, but in my experience, not all "performance improvements" actually improve performance, so I'd expect an appropriate benchmark to demonstrate a real performance benefit to coincide with the proposed patch.

Exactly. That's what I asked to elaborate on at: https://review.openstack.org/#/c/130101/ Also, adding Kevin into CC to make sure he is aware of the discussion.

...

...
For a hypothetical example, what seems like a clear cut improvement in review 130098 (remove unused columns from a query) *might* have an unforeseen side effect later on, where another component doesn't have the data it needs, so it suddenly starts issuing a new DB query to compensate. OpenStack is certainly complicated enough that it's impossible to make accurate assumptions about performance.

/Ihar

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org <mailto:Openstack-stable-maint@lists.openstack.org> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

-----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) iQEcBAEBCgAGBQJUUObtAAoJEC5aWaUY1u57UYwH/j+wjiydOXjA+lFi3l1Pbl5f s7r4Ox6FCPPVoAKziKpygKRbHTrCTew4DcgOxZhmC9qoq+Rk8Q1WFMLlBQ+51Kjj lj/72JiPenKvuZSl/E+9FsmWP7ReCCyUMYWiQS6wp6FAd5KpQMMgdjleUQWEAgjN Y1M9kYVOmqnYHQy4oWJsV0Od2wFKFAGDKohLEzDocmTQFxcfkEeMSn3qJ4aOwkoz KmTFKPGAGU8eTyYNAs3sHa0t9VFwvPoBg4EjMXBjkuoRxz+Nf/IPUZmrruXQ7LM6 ioXEUH3GdKQSCKWtYoFFI1QPpiTQSIalO6nURxUg0UldW6i5QwIX1LTz8GMG+TQ= =JJq0 -----END PGP SIGNATURE----- _______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

Claudiu Belu

6:42 a.m.

New subject: Neutron backports for security group performance

Looking at the backports, they are independent from one another and I would say that these commits have no regression risk: https://review.openstack.org/#/c/130097/2 https://review.openstack.org/#/c/130100/2 Functionally, they do not change and the differences are only inside the scope of the methods and only there they are used. They will be the easiest to approve, if they add a performance boost. (benchmarks needed). Best regards, Claudiu Belu ________________________________________ From: Claudiu Belu [cbelu@cloudbasesolutions.com] Sent: Wednesday, October 29, 2014 3:24 PM To: Ihar Hrachyshka; openstack-stable-maint@lists.openstack.org Cc: kevinbenton@buttewifi.com Subject: Re: [Openstack-stable-maint] Neutron backports for security group performance 1. Miguel: Ideally, yes, it should be enough to validate the backports. But most of the CIs will run a subset of the tempest tests that may or may not include tempest.scenario.test_security_groups_basic_ops, which tests the functionality of the security groups on the compute nodes. But this would be a problem on master too.. 2. Dolph: good point, a benchmark is appropriate in this scenario. It will help us decide whether the backports are worth the risk they have or not. I also agree with the fact that low enough performance is a kind of bug. 3. Thierry: I agree, those backports have a risk of regression, but low performance is a problem too. We should at least see the benchmark results and then decide whether the gain significantly outweighs the risk or not. :) ________________________________________ From: Ihar Hrachyshka [ihrachys@redhat.com] Sent: Wednesday, October 29, 2014 3:09 PM To: openstack-stable-maint@lists.openstack.org Cc: kevinbenton@buttewifi.com Subject: Re: [Openstack-stable-maint] Neutron backports for security group performance -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 29/10/14 14:00, Dolph Mathews wrote:

...

On Wed, Oct 29, 2014 at 5:23 AM, Ihar Hrachyshka <ihrachys@redhat.com <mailto:ihrachys@redhat.com>> wrote:

Hi all,

there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users):

- https://review.openstack.org/130101 - https://review.openstack.org/130098 - https://review.openstack.org/130100 - https://review.openstack.org/130097 - https://review.openstack.org/130105

The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled.

AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion?

Should we come up with some "official" rule on how we handle performance enhancement backports?

...
I'm very much in favor of backporting known performance improvements, but in my experience, not all "performance improvements" actually improve performance, so I'd expect an appropriate benchmark to demonstrate a real performance benefit to coincide with the proposed patch.

Exactly. That's what I asked to elaborate on at: https://review.openstack.org/#/c/130101/ Also, adding Kevin into CC to make sure he is aware of the discussion.

...

...
For a hypothetical example, what seems like a clear cut improvement in review 130098 (remove unused columns from a query) *might* have an unforeseen side effect later on, where another component doesn't have the data it needs, so it suddenly starts issuing a new DB query to compensate. OpenStack is certainly complicated enough that it's impossible to make accurate assumptions about performance.

/Ihar

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org <mailto:Openstack-stable-maint@lists.openstack.org> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

Kevin Benton

11 Nov 11 Nov

9:25 a.m.

Hi, There are two main patches that I am interested in back-porting to improve the performance of the DB queries issued frequently by L2 agents while they are hosting VMs. These are not one-time queries during specific operations (e.g. create/delete), they also happen during normal periodic checks from the L2 agent. Due this constant background behavior, the agents start to trample the Neutron server once the deployment size scales up and will eventually exceed its resources so it can no longer service API requests even though nothing is changing. The only work-around for this right now is to abnormally scale (compared to any of the other standard OpenStack services) the Neutron server and the MySQL nodes to handle the query load. This is really discouraging to deployers (lots of extra compute power wasted as service nodes) and makes Neutron appear extremely unstable to deployers who do not know Neutron needs to be special-cased in this manner. The first patch is to batch up the ports being requested from an RPC agent before querying the database.[1] This is an internal-only change (doesn't affect the data delivered to RCP callers). Before, the server was calling the DB for each port individually so a query from a high-density port node like an L3 agent could result in 1000+ DB queries to the database. Now the service will query the database for all of the port information at once and then group it by port like the agents expect. This is probably the most significant improvement when dealing with high-density nodes and there is a rally performance graph demonstrating this in the comments. The second patch is to eliminate a join across the Neutron port table that was a completely unnecessary calculation for the DB to perform and a waste of data returned (every column from every table in the query).[2] This also doesn't change the data returned to the caller of the function (no missing dict entries, etc), so we shouldn't have to worry about out-of-tree drivers, tools, etc. being broken by this either. I will run the rally performance numbers for this one as well after the first patch gets merged since it has a higher impact than this one. Let me know if I need to elaborate on anything. 1. https://review.openstack.org/#/c/132372/ 2. https://review.openstack.org/#/c/130101/ Thanks, Kevin Benton On Wed, Oct 29, 2014 at 6:09 AM, Ihar Hrachyshka <ihrachys@redhat.com> wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512

On 29/10/14 14:00, Dolph Mathews wrote:

...
On Wed, Oct 29, 2014 at 5:23 AM, Ihar Hrachyshka <ihrachys@redhat.com <mailto:ihrachys@redhat.com>> wrote:

Hi all,

there is a series of Neutron backports in the Juno queue that are intended to significantly improve service performance when handling security groups (one of the issues that are main pain points of current users):

- https://review.openstack.org/130101 - https://review.openstack.org/130098 - https://review.openstack.org/130100 - https://review.openstack.org/130097 - https://review.openstack.org/130105

The first four patches are optimizing db side (controller), while the last one is to avoid fetching security group rules by OVS agent when firewall is disabled.

AFAIK we don't generally backport performance improvements unless they are very significant (though I don't see anything written in stone that says so), but knowing that those patches fix pain hotspots in Neutron, and seem rather isolated, should we consider their inclusion?

Should we come up with some "official" rule on how we handle performance enhancement backports?

...
I'm very much in favor of backporting known performance improvements, but in my experience, not all "performance improvements" actually improve performance, so I'd expect an appropriate benchmark to demonstrate a real performance benefit to coincide with the proposed patch.

Exactly. That's what I asked to elaborate on at: https://review.openstack.org/#/c/130101/

Also, adding Kevin into CC to make sure he is aware of the discussion.

...
...
For a hypothetical example, what seems like a clear cut improvement in review 130098 (remove unused columns from a query) *might* have an unforeseen side effect later on, where another component doesn't have the data it needs, so it suddenly starts issuing a new DB query to compensate. OpenStack is certainly complicated enough that it's impossible to make accurate assumptions about performance.

/Ihar

_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org <mailto:Openstack-stable-maint@lists.openstack.org>

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

...
_______________________________________________ Openstack-stable-maint mailing list Openstack-stable-maint@lists.openstack.org

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint

...
-----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin)

iQEcBAEBCgAGBQJUUObtAAoJEC5aWaUY1u57UYwH/j+wjiydOXjA+lFi3l1Pbl5f s7r4Ox6FCPPVoAKziKpygKRbHTrCTew4DcgOxZhmC9qoq+Rk8Q1WFMLlBQ+51Kjj lj/72JiPenKvuZSl/E+9FsmWP7ReCCyUMYWiQS6wp6FAd5KpQMMgdjleUQWEAgjN Y1M9kYVOmqnYHQy4oWJsV0Od2wFKFAGDKohLEzDocmTQFxcfkEeMSn3qJ4aOwkoz KmTFKPGAGU8eTyYNAs3sHa0t9VFwvPoBg4EjMXBjkuoRxz+Nf/IPUZmrruXQ7LM6 ioXEUH3GdKQSCKWtYoFFI1QPpiTQSIalO6nURxUg0UldW6i5QwIX1LTz8GMG+TQ= =JJq0 -----END PGP SIGNATURE-----

-- Kevin Benton

3890

Age (days ago)

3903

Last active (days ago)

List overview

Download

12 comments

6 participants

participants (6)

Claudiu Belu
Dolph Mathews
Ihar Hrachyshka
Kevin Benton
Miguel Angel Ajo Pelayo
Thierry Carrez