[openstack-dev] [Neutron][qa] Parallel testing update
Robert Collins
robertc at robertcollins.net
Thu Jan 2 21:39:06 UTC 2014
Another way to tackle it would be to create a dedicated tenant for
those tests, then the quota won't interact with anything else.
On 3 January 2014 10:35, Miguel Angel Ajo Pelayo <mangelajo at redhat.com> wrote:
> Hi Salvatore!,
>
> Good work on this.
>
> About the quota limit tests, I believe they may be unit-tested,
> instead of functionally tested.
>
> When running those tests in parallel with any other tests that rely
> on having ports, networks or subnets available into quota, they have
> high chances of making those other tests fail.
>
> Cheers,
> Miguel Ángel Ajo
>
>
>
> ----- Original Message -----
>> From: "Kyle Mestery" <mestery at siliconloons.com>
>> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
>> Sent: Thursday, January 2, 2014 7:53:05 PM
>> Subject: Re: [openstack-dev] [Neutron][qa] Parallel testing update
>>
>> Thanks for the updates here Salvatore, and for continuing to push on
>> this! This is all great work!
>>
>> On Jan 2, 2014, at 6:57 AM, Salvatore Orlando <sorlando at nicira.com> wrote:
>> >
>> > Hi again,
>> >
>> > I've now run the experimental job a good deal of times, and I've filed bugs
>> > for all the issues which came out.
>> > Most of them occurred no more than once among all test execution (I think
>> > about 30).
>> >
>> > They're all tagged with neutron-parallel [1]. for ease of tracking, I've
>> > associated all the bug reports with neutron, but some are probably more
>> > tempest or nova issues.
>> >
>> > Salvatore
>> >
>> > [1] https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel
>> >
>> >
>> > On 27 December 2013 11:09, Salvatore Orlando <sorlando at nicira.com> wrote:
>> > Hi,
>> >
>> > We now have several patches under review which improve a lot how neutron
>> > handles parallel testing.
>> > In a nutshell, these patches try to ensure the ovs agent processes new,
>> > removed, and updated interfaces as soon as possible,
>> >
>> > These patches are:
>> > https://review.openstack.org/#/c/61105/
>> > https://review.openstack.org/#/c/61964/
>> > https://review.openstack.org/#/c/63100/
>> > https://review.openstack.org/#/c/63558/
>> >
>> > There is still room for improvement. For instance the calls from the agent
>> > into the plugins might be consistently reduced.
>> > However, even if the above patches shrink a lot the time required for
>> > processing a device, we are still hitting a hard limit with the execution
>> > ovs commands for setting local vlan tags and clearing flows (or adding the
>> > flow rule for dropping all the traffic).
>> > In some instances this commands slow down a lot, requiring almost 10
>> > seconds to complete. This adds a delay in interface processing which in
>> > some cases leads to the hideous SSH timeout error (the same we see with
>> > bug 1253896 in normal testing).
>> > It is also worth noting that when this happens sysstat reveal CPU usage is
>> > very close to 100%
>> >
>> > From the neutron side there is little we can do. Introducing parallel
>> > processing for interface, as we do for the l3 agent, is not actually a
>> > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests, is
>> > not multithreaded. If you think the situation might be improved by
>> > changing the logic for handling local vlan tags and putting ports on the
>> > dead vlan, I would be happy to talk about that.
>> > On my local machines I've seen a dramatic improvement in processing times
>> > by installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this
>> > something we might consider for gate tests? Also, in order to reduce CPU
>> > usage on the gate (and making tests a bit faster), there is a tempest
>> > patch which stops creating and wiring neutron routers when they're not
>> > needed: https://review.openstack.org/#/c/62962/
>> >
>> > Even in my local setup which succeeds about 85% of times, I'm still seeing
>> > some occurrences of the issue described in [1], which at the end of the
>> > day seems a dnsmasq issue.
>> >
>> > Beyond the 'big' structural problem discussed above, there are some minor
>> > problems with a few tests:
>> >
>> > 1) test_network_quotas.test_create_ports_until_quota_hit fails about 90%
>> > of times. I think this is because the test itself should be made aware of
>> > parallel execution and asynchronous events, and there is a patch for this
>> > already: https://review.openstack.org/#/c/64217
>> >
>> > 2) test_attach_interfaces.test_create_list_show_delete_interfaces fails
>> > about 66% of times. The failure is always on an assertion made after
>> > deletion of interfaces, which probably means the interface is not deleted
>> > within 5 seconds. I think this might be a consequence of the higher load
>> > on the neutron service and we might try to enable multiple workers on the
>> > gate to this aim, or just increase the tempest timeout. On a slightly
>> > different note, allow me to say that the way assertion are made on this
>> > test might be improved a bit. So far one has to go through the code to see
>> > why the test failed.
>> >
>> > Thanks for reading this rather long message.
>> > Regards,
>> > Salvatore
>> >
>> > [1] https://lists.launchpad.net/openstack/msg23817.html
>> >
>> >
>> >
>> >
>> > On 2 December 2013 22:01, Kyle Mestery (kmestery) <kmestery at cisco.com>
>> > wrote:
>> > Yes, this is all great Salvatore and Armando! Thank you for all of this
>> > work
>> > and the explanation behind it all.
>> >
>> > Kyle
>> >
>> > On Dec 2, 2013, at 2:24 PM, Eugene Nikanorov <enikanorov at mirantis.com>
>> > wrote:
>> >
>> > > Salvatore and Armando, thanks for your great work and detailed
>> > > explanation!
>> > >
>> > > Eugene.
>> > >
>> > >
>> > > On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon <joe.gordon0 at gmail.com>
>> > > wrote:
>> > >
>> > > On Dec 2, 2013 9:04 PM, "Salvatore Orlando" <sorlando at nicira.com> wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > As you might have noticed, there has been some progress on parallel
>> > > > tests for neutron.
>> > > > In a nutshell:
>> > > > * Armando fixed the issue with IP address exhaustion on the public
>> > > > network [1]
>> > > > * Salvatore has now a patch which has a 50% success rate (the last
>> > > > failures are because of me playing with it) [2]
>> > > > * Salvatore is looking at putting back on track full isolation [3]
>> > > > * All the bugs affecting parallel tests can be queried here [10]
>> > > > * This blueprint tracks progress made towards enabling parallel testing
>> > > > [11]
>> > > >
>>
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud
More information about the OpenStack-dev
mailing list