[openstack-dev] [tripleo] pingtest vs tempest

Dan Prince dprince at redhat.com
Thu May 4 00:57:54 UTC 2017


On Wed, 2017-05-03 at 17:53 -0400, Emilien Macchi wrote:
> (cross-posting)
> 
> I've seen a bunch of interesting thoughts here.
> The most relevant feedback I've seen so far:
> 
> - TripleO folks want to keep testing fast and efficient.
> - Tempest folks understand this problematic and is willing to
> collaborate.
> 
> I propose that we move forward and experiment the usage of Tempest in
> TripleO CI for one job that could be experimental or non-voting to
> start.

Experimental or periodic at first please.

> Instead of running the Pingtest, we would execute a Tempest Scenario
> that boot an instance from volume (like Pingstest is already doing)
> and see how it goes (in term of coverage and runtime).
> I volunteer to kick-off the work with someone more expert than I am
> with quickstart (Arx maybe?).
> 
> Another iteration could be to start building an easy interface to
> select which Tempest tests we want a TripleO CI job to run and plug
> it
> to our CI tooling (tripleo-quickstart I presume).

Running a subset of Tempest tests isn't the same thing as designing
(and owning) your own test suite that targets the things that mean the
most to our community (namely speed and coverage). Even giving up 5-10
minutes of runtime...just to be able to run Tempest isn't something
that some of us would be willing to do.

> I also hear some feedback about keeping the pingtest alive for some
> uses cases, and I agree we could keep some CI jobs to run the
> pingtest
> when it makes more sense (when we want to test Heat for example, or
> just maintain it for developers who used it).



> 
> How does it sounds? Please bring feedback.
> 
> 
> On Tue, Apr 18, 2017 at 7:41 AM, Attila Fazekas <afazekas at redhat.com>
> wrote:
> > 
> > 
> > On Tue, Apr 18, 2017 at 11:04 AM, Arx Cruz <arxcruz at redhat.com>
> > wrote:
> > > 
> > > 
> > > 
> > > On Tue, Apr 18, 2017 at 10:42 AM, Steven Hardy <shardy at redhat.com
> > > > wrote:
> > > > 
> > > > On Mon, Apr 17, 2017 at 12:48:32PM -0400, Justin Kilpatrick
> > > > wrote:
> > > > > On Mon, Apr 17, 2017 at 12:28 PM, Ben Nemec <openstack at nemebe
> > > > > an.com>
> > > > > wrote:
> > > > > > Tempest isn't really either of those things.  According to
> > > > > > another
> > > > > > message
> > > > > > in this thread it takes around 15 minutes to run just the
> > > > > > smoke
> > > > > > tests.
> > > > > > That's unacceptable for a lot of our CI jobs.
> > > 
> > > 
> > > I rather spend 15 minutes running tempest than add a regression
> > > or a new
> > > bug, which already happen in the past.
> > > 
> > 
> > The smoke tests might not be the best test selection anyway, you
> > should pick
> > some scenario which does
> > for example snapshot of images and volumes. yes, these are the slow
> > ones,
> > but they can run in parallel.
> > 
> > Very likely you do not really want to run all tempest test, but
> > 10~20 minute
> > time,
> > sounds reasonable for a sanity test.
> > 
> > The tempest config utility also should be extended by some parallel
> > capability,
> > and should be able to use already downloaded (part of the image)
> > resources.
> > 
> > Tempest/testr/subunit worker balance is not always the best,
> > technically would be possible to do dynamic balancing, but it would
> > require
> > a lot of work.
> > Let me know when it becomes the main concern, I can check what
> > can/cannot be
> > done.
> > 
> > 
> > > 
> > > > 
> > > > > Ben, is the issue merely the time it takes? Is it the affect
> > > > > that time
> > > > > taken has on hardware availability?
> > > > 
> > > > It's both, but the main constraint is the infra job timeout,
> > > > which is
> > > > about
> > > > 2.5hrs - if you look at our current jobs many regularly get
> > > > close to (and
> > > > sometimes exceed this), so we just don't have the time budget
> > > > available
> > > > to
> > > > run exhasutive tests every commit.
> > > 
> > > 
> > > We have green light from infra to increase the job timeout to 5
> > > hours, we
> > > do that in our periodic full tempest job.
> > 
> > 
> > Sounds good, but I am afraid it could hurt more than helping, it
> > could delay
> > other things get fixed by lot
> > especially if we got some extra flakiness, because of foobar.
> > 
> > You cannot have all possible tripleo configs on the gate anyway,
> > so something will pass which will require a quick fix.
> > 
> > IMHO the only real solution, is making the before test-run steps
> > faster or
> > shorter.
> > 
> > Do you have any option to start the tempest running jobs in a more
> > developed
> > state ?
> > I mean, having more things already done at the start
> > time  (images/snapshot)
> > and just do a fast upgrade at the beginning of the job.
> > 
> > Openstack installation can be completed in a `fast` way (~minute)
> > on
> > RHEL/Fedora systems
> > after the yum steps, also if you are able to aggregate all yum step
> > to
> > single
> > command execution (transaction) you generally able to save a lot of
> > time.
> > 
> > There is plenty of things what can be made more efficient before
> > the test
> > run,
> > when you start considering everything evil which can be accounted
> > for more
> > than 30 sec
> > of time, this can happen soon.
> > 
> > For example just executing the cpython interpreter for the
> > openstack
> > commands is above 30 sec,
> > the work what they are doing can be done in much much faster way.
> > 
> > Lot of install steps actually does not depends on each other,
> > it allows more things to be done in parallel, we generally can have
> > more
> > core than Ghz.
> > 
> > 
> > > 
> > > > 
> > > > 
> > > > > Should we focus on how much testing we can get into N time
> > > > > period?
> > > > > Then how do we decide an optimal N
> > > > > for our constraints?
> > > > 
> > > > Well yeah, but that's pretty much how/why we ended up with
> > > > pingtest, it's
> > > > simple, fast, and provides an efficient way to do smoke tests,
> > > > e.g
> > > > creating
> > > > just one heat resource is enough to prove multiple OpenStack
> > > > services are
> > > > running, as well as the DB/RPC etc etc.
> > > > 
> > > > > I've been working on a full up functional test for OpenStack
> > > > > CI builds
> > > > > for a long time now, it works but takes
> > > > > more than 10 hours. IF you're interested in results kick
> > > > > through to
> > > > > Kibana here [0]. Let me know off list if you
> > > > > have any issues, the presentation of this data is all
> > > > > experimental
> > > > > still.
> > > > 
> > > > This kind of thing is great, and I'd support more exhaustive
> > > > testing via
> > > > periodic jobs etc, but the reality is we need to focus on "bang
> > > > for buck"
> > > > e.g the deepest possible coverage in the most minimal amount of
> > > > time for
> > > > our per-commit tests - we rely on the project gates to provide
> > > > a full API
> > > > surface test, and we need to focus on more basic things like
> > > > "did the
> > > > service
> > > > start", and "is the API accessible".  Simple crud operations on
> > > > a subset
> > > > of
> > > > the API's is totally fine for this IMO, whether via pingtest or
> > > > some
> > > > other
> > > > means.
> > > > 
> > > 
> > > Right now we do have a periodic job running full tempest, with a
> > > few
> > > skips, and because of the lack of tempest tests in the patches,
> > > it's being
> > > pretty hard to keep it stable enough to have a 100% pass, and of
> > > course,
> > > also the installation very often fails (like in the last five
> > > days).
> > > For example, [1] is the latest run we have in periodic job that
> > > we get
> > > results from tempest, and we have 114 failures that was caused by
> > > some new
> > > code/change, and I have no idea which one was, just looking at
> > > the failures,
> > > I can notice that smoke tests plus minimum basic scenario tests
> > > would catch
> > > these failures and the developer could fix it and make me happy
> > > :)
> > > Now I have to spend several hours installing and debugging each
> > > one of
> > > those tests to identify where/why it fails.
> > > Before this run, we got 100% pass, but unfortunately I don't have
> > > the
> > > results anymore, it was removed already from logs.openstack.org
> > > 
> > > 
> > > > 
> > > > Steve
> > > > 
> > > > 
> > > > _______________________________________________________________
> > > > ___________
> > > > OpenStack Development Mailing List (not for usage questions)
> > > > Unsubscribe:
> > > > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-d
> > > > ev
> > > 
> > > 
> > > [1]
> > > http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-o
> > > vb-nonha-tempest-oooq/0072651/logs/oooq/stackviz/#/stdin
> > > 
> > > _________________________________________________________________
> > > _________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:un
> > > subscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > 
> > 
> > 
> > ___________________________________________________________________
> > _______
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsu
> > bscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> 
> 



More information about the OpenStack-dev mailing list