[openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

Clark Boylan cboylan at sapwetik.org
Tue Mar 22 01:48:51 UTC 2016


On Mon, Mar 21, 2016, at 06:37 PM, Assaf Muller wrote:
> On Mon, Mar 21, 2016 at 9:26 PM, Clark Boylan <cboylan at sapwetik.org>
> wrote:
> > On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote:
> >> On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan <cboylan at sapwetik.org>
> >> wrote:
> >>
> >> If what we want is to cut down execution time I'd suggest to stop
> >> running Cinder tests on Neutron patches (Call it as an experiment) and
> >> see how long it takes for a regression to slip in. Being an
> >> optimistic, I would guess: Never!
> >
> > Experience has shown about a week and that its not an if but a when.
> 
> I'm really curious how can a Neutron patch screw up Cinder (And the
> regression be missed by Neutron and Nova tests that interact with
> Neutron). I guess I wasn't around when this was happening. If anyone
> could shed historic light on this I'd appreciate it.

Not neutron screwing up cinder just general time to regression when gate
stops testing something. We saw it when we stopped testing postgres for
example.

> >> If we're running these tests on Neutron patches solely as a data point
> >> for performance testing, Tempest is obviously not the tool for the job
> >> and doesn't provide any added value we can't get from Rally and
> >> profilers for example. If there's otherwise value for running Cinder
> >> (And other tests that don't exercise the Neutron API), I'd love to
> >> know what it is :) I cannot remember any legit Cinder failure on
> >> Neutron patches.
> >
> > I think that is the complete wrong approach to take here. We have caught
> > a problem in neutron your goal should be to fix it not to stop testing
> > it.
> 
> You misunderstood my intentions. I'm not saying we should plant our
> head in the sand and sing until the problem goes away, but I am saying
> that if we're interested in uncovering performance issues with
> Neutron's control plane, then there's more effective ways to do so. If
> you're interested and have the energy, profiling the neutron-server
> process while running Rally tests is a much better usage of time.
> Comparing nova-network and Neutron is just not a useful data point.

The question was why is Neutron CI so slow. Upon investigation I found
that jobs using nova-net are ~20 minutes faster in one cloud than those
using neutron. I am not attempting to do performance testing on Neutron
I am attempting to narrow down where this lost 20 minutes can be found.
In this case it is a very useful data point. We know we can run these
tests faster because we have that data. Therefore the assumption is that
neutron can (and honestly it should) run just as quickly.

We need these tests for integration testing (at least thats the
assertion by them living in tempest). We also want the jobs to run
faster (the topic of this thread). Using the data available to us we
find that the biggest costs in these jobs is the tempest testing itself.
The best way to make the jobs run quicker is to address the tests
themselves. Looking at the relative performance of the two solutions
available to us we find that there is room for improvement in the
Neutron testing. Thats all I am trying to point out. This has nothing to
do with proper performance testing or running rally and everything to do
with make the integration tests quicker.

> > The fact that neutron is much slower in these test cases is an
> > indication that these tests DO exercise the neutron api and that you do
> > want to cover these code paths and that you need to address them, not
> > that you should stop testing them.
> >
> > We are not running these tests on neutron solely for performance
> > testing. In fact to get reasonable performance testing out of it I had
> > to jump through a few hoops: make tempest run serially then recheck
> > until the jobs ran in the same cloud more than once. Performance testing
> > has never been the goal of these tests. These tests exist to make sure
> > that OpenStack works. Boot from volume is an important piece of this and
> > we are making sure that OpenStack (this means glance, nova, neutron,
> > cinder) continue to work for this use case.



More information about the OpenStack-dev mailing list