[neutron] security group list regression

James Denton james.denton at rackspace.com
Wed Mar 4 19:26:54 UTC 2020


Hi Rodolfo,

The client we're using for Train does indeed have the patch. The Stein environment, running python-openstackclient 3.18.1, did not. I was able to patch it and speed up the DELETE operation. Real world, the user could probably just update the client and get the fix.

Thanks again!

On 3/3/20, 4:49 AM, "Rodolfo Alonso" <ralonsoh at redhat.com> wrote:

    CAUTION: This message originated externally, please use caution when clicking on links or opening attachments!
    
    
    Hello James:
    
    Yes, this is a known issue in OSclient: most of the "objects" (networks, subnets, routers, etc) to
    be retrieved, can usually can be retrieved by ID and by name. OSclient tries first to use the ID
    because is unique and a DB key. Then, instead of asking the server for a unique register (filtered
    by the name), the client retrieves the whole list and filters the results.
    
    But this problem was resolved in Train: https://review.opendev.org/#/c/637238/. Can you check, in
    openstacksdk, that you have this patch? At least in T.
    
    According to [1] and [2], "name" should be used as filter in the OSsdk "find" call.
    
    Regards.
    
    [1]https://review.opendev.org/#/c/637238/20/openstack/resource.py
    [2]https://github.com/openstack/openstacksdk/blob/master/openstack/network/v2/security_group.py#L29
    
    On Mon, 2020-03-02 at 22:25 +0000, James Denton wrote:
    > Rodolfo,
    >
    > Thanks for continuing to push this on the ML and in the bug report.
    >
    > Happy to report that the client and SDK patches you provided have drastically reduced the SG list
    > time from ~90-120s to ~12-14s within Stein and Train lab environments.
    >
    > One last thing... when you perform an 'openstack security group delete <name>', the initial lookup
    > by name fails. In Train, the client falls back to using the 'name' parameter (/security-
    > groups?name=<name>). This lookup is quick and the security group is found and deleted. However, on
    > Rocky/Stein (e.g. client 3.18.1), instead of searching by parameter, the client appears to perform
    > a GET /security-groups without limiting the fields and takes a long time.
    >
    > 'openstack security group list' with patch:
    > REQ: curl -g -i -X GET "
    > http://10.0.236.150:9696/v2.0/security-groups?fields=set%28%5B%27description%27%2C+%27project_id%27%2C+%27id%27%2C+%27tags%27%2C+%27name%27%5D%29
    > " -H "Accept: application/json" -H "User-Agent: openstacksdk/0.27.0 keystoneauth1/3.13.1 python-
    > requests/2.21.0 CPython/2.7.17" -H "X-Auth-Token:
    > {SHA256}3e747da939e8c4befe72d5ca7105971508bd56cdf36208ba6b960d1aee6d19b6"
    >
    > 'openstack security group delete <name>':
    >
    > Train (notice the name param):
    > REQ: curl -g -i -X GET http://10.20.0.11:9696/v2.0/security-groups/train-test-1755 -H "User-Agent:
    > openstacksdk/0.36.0 keystoneauth1/3.17.1 python-requests/2.22.0 CPython/3.6.7" -H "X-Auth-Token:
    > {SHA256}bf291d5f12903876fc69151db37d295da961ba684a575e77fb6f4829b55df1bf"
    > http://10.20.0.11:9696 "GET /v2.0/security-groups/train-test-1755 HTTP/1.1" 404 125
    > REQ: curl -g -i -X GET "http://10.20.0.11:9696/v2.0/security-groups?name=train-test-1755" -H
    > "Accept: application/json" -H "User-Agent: openstacksdk/0.36.0 keystoneauth1/3.17.1 python-
    > requests/2.22.0 CPython/3.6.7" -H "X-Auth-Token:
    > {SHA256}bf291d5f12903876fc69151db37d295da961ba684a575e77fb6f4829b55df1bf"
    > http://10.20.0.11:9696 "GET /v2.0/security-groups?name=train-test-1755 HTTP/1.1" 200 1365
    >
    > Stein & below (notice lack of fields):
    > REQ: curl -g -i -X GET http://10.0.236.150:9696/v2.0/security-groups/stein-test-5189 -H "User-
    > Agent: openstacksdk/0.27.0 keystoneauth1/3.13.1 python-requests/2.21.0 CPython/2.7.17" -H "X-Auth-
    > Token: {SHA256}e9f87afe851ff5380d8402ee81199c466be9c84fe67ed0302e8b178f33aa1fc2"
    > http://10.0.236.150:9696 "GET /v2.0/security-groups/stein-test-5189 HTTP/1.1" 404 125
    > REQ: curl -g -i -X GET http://10.0.236.150:9696/v2.0/security-groups -H "Accept: application/json"
    > -H "User-Agent: openstacksdk/0.27.0 keystoneauth1/3.13.1 python-requests/2.21.0 CPython/2.7.17" -H
    > "X-Auth-Token: {SHA256}e9f87afe851ff5380d8402ee81199c466be9c84fe67ed0302e8b178f33aa1fc2"
    > <wait awhile while it compiles and returns the full list, then the single SG object is deleted>
    >
    > Haven't quite figured out where fields can be used to speed up the delete process on the older
    > client, or if the newer client would be backwards-compatible (and how far back).
    >
    > Thanks,
    > James
    >
    > On 3/2/20, 9:31 AM, "James Denton" <james.denton at rackspace.com> wrote:
    >
    >     CAUTION: This message originated externally, please use caution when clicking on links or
    > opening attachments!
    >
    >
    >     Thanks, Rodolfo. I'll take a look at each of these after coffee and clarify my position (if
    > needed).
    >
    >     James
    >
    >     On 3/2/20, 6:27 AM, "Rodolfo Alonso" <ralonsoh at redhat.com> wrote:
    >
    >         CAUTION: This message originated externally, please use caution when clicking on links or
    > opening attachments!
    >
    >
    >         Hello James:
    >
    >         Just to make a quick summary of the status of the commented bugs/regressions:
    >
    >         1) https://bugs.launchpad.net/neutron/+bug/1810563: adding rules to security groups is
    > slow
    >         That was addressed in https://review.opendev.org/#/c/633145/ and
    >         https://review.opendev.org/#/c/637407/, removing the O^2 check and using lazy loading.
    >
    >
    >         2) https://bugzilla.redhat.com/show_bug.cgi?id=1788749: Neutron List networks API
    > regression
    >         The last reply was marked as private. I've undone this and you can read now c#2. Testing
    > with a
    >         similar scenario, I don't see any performance degradation between Queens and Train.
    >
    >
    >         3) https://bugzilla.redhat.com/show_bug.cgi?id=1721273: Neutron API List Ports Performance
    >         regression
    >         That problem was solved in https://review.opendev.org/#/c/667981/ and
    >         https://review.opendev.org/#/c/667998/, by refactoring how the port QoS extension was
    > reading and
    >         applying the QoS info in the port dict.
    >
    >
    >         4) https://bugs.launchpad.net/neutron/+bug/1865223: regression for security group list
    > between
    >         Newton and Rocky+
    >
    >         This is similar to https://bugs.launchpad.net/neutron/+bug/1863201. In this case, the
    > regression was
    >         detected from R to S. The performance dropped from 3 secs to 110 secs (36x). That issue
    > was
    >         addressed by https://review.opendev.org/#/c/708695/.
    >
    >         But while 1865223 is talking about *SG list*, 1863201 is related to *SG rule list*. I
    > would like to
    >         make this differentiation, because both retrieval commands are not related.
    >
    >         In this bug (1863201), the performance degradation multiplies by x3 (N->Q) the initial
    > time. This
    >         could be caused by the OVO integration (O->P: https://review.opendev.org/#/c/284738/).
    > Instead of
    >         using the DB object now we make this call using the OVO object containing the DB register
    > (something
    >         like a DB view). That's something I still need to check.
    >
    >         Just to make a concretion: the patch 708695 improves the *SG rule* retrieval, not the SG
    > list
    >         command. Another punctualization is that this patch will help in the case of having a
    > balance
    >         between SG rules and SG. This patch will help to retrieve from the DB only those SG rules
    > belonging
    >         to the project. If, as you state in
    > https://bugs.launchpad.net/neutron/+bug/1865223/comments/4, most
    >         of those SG rules belong to the same project, there is little improvement there.
    >
    >         As commented, I'm still looking at improving the SG OVO performance.
    >
    >         Regards
    >
    >
    >         On Mon, 2020-03-02 at 03:03 +0000, Erik Olof Gunnar Andersson wrote:
    >         > When we went from Mitaka to Rocky in August last year and we saw an exponential increase
    > in api
    >         > times for listing security group rules.
    >         >
    >         > I think I last commented on this bug https://bugs.launchpad.net/neutron/+bug/1810563,
    > but I have
    >         > brought it up on a few other occasions as well.
    >         >  Bug #1810563 “adding rules to security groups is slow” : Bugs : neutron Sometime
    > between liberty
    >         > and pike, adding rules to SG's got slow, and slower with every rule added. Gerrit review
    > with
    >         > fixes is incoming. You can repro with a vanilla devstack install on master, and this
    > script:
    >         > #!/bin/bash OPENSTACK_TOKEN=$(openstack token issue | grep '| id' | awk '{print $4}')
    > export
    >         > OPENSTACK_TOKEN CCN1=10.210.162.2 CCN3=10.210.162.10 export ENDPOINT=localhost
    > make_rules() {
    >         > iter=$1 prefix=$2 file="$3" echo "generating rules" cat >$file
    > <<EOF
    >         > {... bugs.launchpad.net
    >         >
    >         >
    >         > From: Slawek Kaplonski <skaplons at redhat.com>
    >         > Sent: Saturday, February 29, 2020 12:44 AM
    >         > To: James Denton <james.denton at rackspace.com>
    >         > Cc: openstack-discuss <openstack-discuss at lists.openstack.org>
    >         > Subject: Re: [neutron] security group list regression
    >         >
    >         > Hi,
    >         >
    >         > I just replied in Your bug report. Can You try to apply patch
    >         >
    > https://urldefense.com/v3/__https://review.opendev.org/*/c/708695/__;Iw!!Ci6f514n9QsL8ck!2GsBjp6V_V3EzrzAbWgNfsURfCm2tZmlUaw2J6OxFwJZUCV71lSP1b9jg8Ul-OlUqQ$
    >         >   to see if that will help with this problem?
    >         >
    >         > > On 29 Feb 2020, at 02:41, James Denton <james.denton at rackspace.com> wrote:
    >         > >
    >         > > Hello all,
    >         > >
    >         > > We recently upgraded an environment from Newton -> Rocky, and have noticed a pretty
    > severe
    >         > regression in the time it takes the API to return the list of security groups. This
    > environment
    >         > has roughly 8,000+ security groups, and it takes nearly 75 seconds for the ‘openstack
    > security
    >         > group list’ command to complete. I don’t have actual data from the same environment
    > running
    >         > Newton, but was able to replicate this behavior with the following lab environments
    > running a mix
    >         > of virtual and baremetal machines:
    >         > >
    >         > > Newton (VM)
    >         > > Rocky (BM)
    >         > > Stein (VM)
    >         > > Train (BM)
    >         > >
    >         > > Number of sec grps vs time in seconds:
    >         > >
    >         > > #     Newton Rocky Stein  Train
    >         > > 200   4.1     3.7     5.4     5.2
    >         > > 500   5.3     7       11      9.4
    >         > > 1000  7.2     12.4    19.2    16
    >         > > 2000  9.2     24.2    35.3    30.7
    >         > > 3000  12.1    36.5    52      44
    >         > > 4000  16.1    47.2    73      58.9
    >         > > 5000  18.4    55      90      69
    >         > >
    >         > > As you can see (hopefully), the response time increased significantly between Newton
    > and Rocky,
    >         > and has grown slightly ever since. We don't know, yet, if this behavior can be seen with
    > other
    >         > 'list' commands or is limited to secgroups. We're currently verifying on some
    > intermediate
    >         > releases to see where things went wonky.
    >         > >
    >         > > There are some similar recent reports out in the wild with little feedback:
    >         > >
    >         > >
    >         >
    > https://urldefense.com/v3/__https://bugzilla.redhat.com/show_bug.cgi?id=1788749__;!!Ci6f514n9QsL8ck!2GsBjp6V_V3EzrzAbWgNfsURfCm2tZmlUaw2J6OxFwJZUCV71lSP1b9jg8Vx5jGlrA$
    >         >
    >         > >
    >         >
    > https://urldefense.com/v3/__https://bugzilla.redhat.com/show_bug.cgi?id=1721273__;!!Ci6f514n9QsL8ck!2GsBjp6V_V3EzrzAbWgNfsURfCm2tZmlUaw2J6OxFwJZUCV71lSP1b9jg8U9NbN_LA$
    >         >
    >         > >
    >         > > I opened a bug here, too:
    >         > >
    >         > >
    >         >
    > https://urldefense.com/v3/__https://bugs.launchpad.net/neutron/*bug/1865223__;Kw!!Ci6f514n9QsL8ck!2GsBjp6V_V3EzrzAbWgNfsURfCm2tZmlUaw2J6OxFwJZUCV71lSP1b9jg8UtMQ2-Dw$
    >         >
    >         > >
    >         > > Bottom line: Has anyone else experienced similar regressions in recent releases? If
    > so, were you
    >         > able to address them with any sort of tuning?
    >         > >
    >         > > Thanks in advance,
    >         > > James
    >         > >
    >         >
    >         > —
    >         > Slawek Kaplonski
    >         > Senior software engineer
    >         > Red Hat
    >         >
    >         >
    >
    >
    >
    >
    >
    
    



More information about the openstack-discuss mailing list