[openstack-dev] [Neutron][qa] Parallel testing update

Salvatore Orlando sorlando at nicira.com
Fri Jan 3 08:55:02 UTC 2014


I have already a patch under review for the quota test, for which I adopted
the shortest-diff approach.
As regards Robert's suggestion, the problem we have there is that the test
uses a dedicated tenant, but it does not take into account the possibility
that at some point the dhcp agent will create a port too for that tenant.

In theory I tend to agree with Miguel; but I'm not sure what would be the
consensus on removing a scenario test. I think we either decide to merge
this shortest-diff patch [1] once the comments are addressed, or re-design
the tests, which might take some more time.

Salvatore

PS: shortest-diff is, as you might have already understood, an euphemism
for 'hack'


[1] https://review.openstack.org/#/c/64217/



On 2 January 2014 22:39, Robert Collins <robertc at robertcollins.net> wrote:

> Another way to tackle it would be to create a dedicated tenant for
> those tests, then the quota won't interact with anything else.
>
> On 3 January 2014 10:35, Miguel Angel Ajo Pelayo <mangelajo at redhat.com>
> wrote:
> > Hi Salvatore!,
> >
> >    Good work on this.
> >
> >    About the quota limit tests, I believe they may be unit-tested,
> > instead of functionally tested.
> >
> >    When running those tests in parallel with any other tests that rely
> > on having ports, networks or subnets available into quota, they have
> > high chances of making those other tests fail.
> >
> > Cheers,
> > Miguel Ángel Ajo
> >
> >
> >
> > ----- Original Message -----
> >> From: "Kyle Mestery" <mestery at siliconloons.com>
> >> To: "OpenStack Development Mailing List (not for usage questions)" <
> openstack-dev at lists.openstack.org>
> >> Sent: Thursday, January 2, 2014 7:53:05 PM
> >> Subject: Re: [openstack-dev] [Neutron][qa] Parallel testing update
> >>
> >> Thanks for the updates here Salvatore, and for continuing to push on
> >> this! This is all great work!
> >>
> >> On Jan 2, 2014, at 6:57 AM, Salvatore Orlando <sorlando at nicira.com>
> wrote:
> >> >
> >> > Hi again,
> >> >
> >> > I've now run the experimental job a good deal of times, and I've
> filed bugs
> >> > for all the issues which came out.
> >> > Most of them occurred no more than once among all test execution (I
> think
> >> > about 30).
> >> >
> >> > They're all tagged with neutron-parallel [1]. for ease of tracking,
> I've
> >> > associated all the bug reports with neutron, but some are probably
> more
> >> > tempest or nova issues.
> >> >
> >> > Salvatore
> >> >
> >> > [1]
> https://bugs.launchpad.net/neutron/+bugs?field.tag=neutron-parallel
> >> >
> >> >
> >> > On 27 December 2013 11:09, Salvatore Orlando <sorlando at nicira.com>
> wrote:
> >> > Hi,
> >> >
> >> > We now have several patches under review which improve a lot how
> neutron
> >> > handles parallel testing.
> >> > In a nutshell, these patches try to ensure the ovs agent processes
> new,
> >> > removed, and updated interfaces as soon as possible,
> >> >
> >> > These patches are:
> >> > https://review.openstack.org/#/c/61105/
> >> > https://review.openstack.org/#/c/61964/
> >> > https://review.openstack.org/#/c/63100/
> >> > https://review.openstack.org/#/c/63558/
> >> >
> >> > There is still room for improvement. For instance the calls from the
> agent
> >> > into the plugins might be consistently reduced.
> >> > However, even if the above patches shrink a lot the time required for
> >> > processing a device, we are still hitting a hard limit with the
> execution
> >> > ovs commands for setting local vlan tags and clearing flows (or
> adding the
> >> > flow rule for dropping all the traffic).
> >> > In some instances this commands slow down a lot, requiring almost 10
> >> > seconds to complete. This adds a delay in interface processing which
> in
> >> > some cases leads to the hideous SSH timeout error (the same we see
> with
> >> > bug 1253896 in normal testing).
> >> > It is also worth noting that when this happens sysstat reveal CPU
> usage is
> >> > very close to 100%
> >> >
> >> > From the neutron side there is little we can do. Introducing parallel
> >> > processing for interface, as we do for the l3 agent, is not actually a
> >> > solution, since ovs-vswitchd v1.4.x, the one executed on gate tests,
> is
> >> > not multithreaded. If you think the situation might be improved by
> >> > changing the logic for handling local vlan tags and putting ports on
> the
> >> > dead vlan, I would be happy to talk about that.
> >> > On my local machines I've seen a dramatic improvement in processing
> times
> >> > by installing ovs 2.0.0, which has a multi-threaded vswitchd. Is this
> >> > something we might consider for gate tests? Also, in order to reduce
> CPU
> >> > usage on the gate (and making tests a bit faster), there is a tempest
> >> > patch which stops creating and wiring neutron routers when they're not
> >> > needed: https://review.openstack.org/#/c/62962/
> >> >
> >> > Even in my local setup which succeeds about 85% of times, I'm still
> seeing
> >> > some occurrences of the issue described in [1], which at the end of
> the
> >> > day seems a dnsmasq issue.
> >> >
> >> > Beyond the 'big' structural problem discussed above, there are some
> minor
> >> > problems with a few tests:
> >> >
> >> > 1) test_network_quotas.test_create_ports_until_quota_hit  fails about
> 90%
> >> > of times. I think this is because the test itself should be made
> aware of
> >> > parallel execution and asynchronous events, and there is a patch for
> this
> >> > already: https://review.openstack.org/#/c/64217
> >> >
> >> > 2) test_attach_interfaces.test_create_list_show_delete_interfaces
> fails
> >> > about 66% of times. The failure is always on an assertion made after
> >> > deletion of interfaces, which probably means the interface is not
> deleted
> >> > within 5 seconds. I think this might be a consequence of the higher
> load
> >> > on the neutron service and we might try to enable multiple workers on
> the
> >> > gate to this aim, or just increase the tempest timeout. On a slightly
> >> > different note, allow me to say that the way assertion are made on
> this
> >> > test might be improved a bit. So far one has to go through the code
> to see
> >> > why the test failed.
> >> >
> >> > Thanks for reading this rather long message.
> >> > Regards,
> >> > Salvatore
> >> >
> >> > [1] https://lists.launchpad.net/openstack/msg23817.html
> >> >
> >> >
> >> >
> >> >
> >> > On 2 December 2013 22:01, Kyle Mestery (kmestery) <kmestery at cisco.com
> >
> >> > wrote:
> >> > Yes, this is all great Salvatore and Armando! Thank you for all of
> this
> >> > work
> >> > and the explanation behind it all.
> >> >
> >> > Kyle
> >> >
> >> > On Dec 2, 2013, at 2:24 PM, Eugene Nikanorov <enikanorov at mirantis.com
> >
> >> > wrote:
> >> >
> >> > > Salvatore and Armando, thanks for your great work and detailed
> >> > > explanation!
> >> > >
> >> > > Eugene.
> >> > >
> >> > >
> >> > > On Mon, Dec 2, 2013 at 11:48 PM, Joe Gordon <joe.gordon0 at gmail.com>
> >> > > wrote:
> >> > >
> >> > > On Dec 2, 2013 9:04 PM, "Salvatore Orlando" <sorlando at nicira.com>
> wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > As you might have noticed, there has been some progress on
> parallel
> >> > > > tests for neutron.
> >> > > > In a nutshell:
> >> > > > * Armando fixed the issue with IP address exhaustion on the public
> >> > > > network [1]
> >> > > > * Salvatore has now a patch which has a 50% success rate (the last
> >> > > > failures are because of me playing with it) [2]
> >> > > > * Salvatore is looking at putting back on track full isolation [3]
> >> > > > * All the bugs affecting parallel tests can be queried here [10]
> >> > > > * This blueprint tracks progress made towards enabling parallel
> testing
> >> > > > [11]
> >> > > >
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140103/f3777b1f/attachment.html>


More information about the OpenStack-dev mailing list