[openstack-dev] [tripleo] pingtest vs tempest

Emilien Macchi emilien at redhat.com
Wed May 3 21:53:09 UTC 2017


I've seen a bunch of interesting thoughts here.
The most relevant feedback I've seen so far:

- TripleO folks want to keep testing fast and efficient.
- Tempest folks understand this problematic and is willing to collaborate.

I propose that we move forward and experiment the usage of Tempest in
TripleO CI for one job that could be experimental or non-voting to
Instead of running the Pingtest, we would execute a Tempest Scenario
that boot an instance from volume (like Pingstest is already doing)
and see how it goes (in term of coverage and runtime).
I volunteer to kick-off the work with someone more expert than I am
with quickstart (Arx maybe?).

Another iteration could be to start building an easy interface to
select which Tempest tests we want a TripleO CI job to run and plug it
to our CI tooling (tripleo-quickstart I presume).
I also hear some feedback about keeping the pingtest alive for some
uses cases, and I agree we could keep some CI jobs to run the pingtest
when it makes more sense (when we want to test Heat for example, or
just maintain it for developers who used it).

How does it sounds? Please bring feedback.

On Tue, Apr 18, 2017 at 7:41 AM, Attila Fazekas <afazekas at redhat.com> wrote:
> On Tue, Apr 18, 2017 at 11:04 AM, Arx Cruz <arxcruz at redhat.com> wrote:
>> On Tue, Apr 18, 2017 at 10:42 AM, Steven Hardy <shardy at redhat.com> wrote:
>>> On Mon, Apr 17, 2017 at 12:48:32PM -0400, Justin Kilpatrick wrote:
>>> > On Mon, Apr 17, 2017 at 12:28 PM, Ben Nemec <openstack at nemebean.com>
>>> > wrote:
>>> > > Tempest isn't really either of those things.  According to another
>>> > > message
>>> > > in this thread it takes around 15 minutes to run just the smoke
>>> > > tests.
>>> > > That's unacceptable for a lot of our CI jobs.
>>> >
>> I rather spend 15 minutes running tempest than add a regression or a new
>> bug, which already happen in the past.
> The smoke tests might not be the best test selection anyway, you should pick
> some scenario which does
> for example snapshot of images and volumes. yes, these are the slow ones,
> but they can run in parallel.
> Very likely you do not really want to run all tempest test, but 10~20 minute
> time,
> sounds reasonable for a sanity test.
> The tempest config utility also should be extended by some parallel
> capability,
> and should be able to use already downloaded (part of the image) resources.
> Tempest/testr/subunit worker balance is not always the best,
> technically would be possible to do dynamic balancing, but it would require
> a lot of work.
> Let me know when it becomes the main concern, I can check what can/cannot be
> done.
>>> > Ben, is the issue merely the time it takes? Is it the affect that time
>>> > taken has on hardware availability?
>>> It's both, but the main constraint is the infra job timeout, which is
>>> about
>>> 2.5hrs - if you look at our current jobs many regularly get close to (and
>>> sometimes exceed this), so we just don't have the time budget available
>>> to
>>> run exhasutive tests every commit.
>> We have green light from infra to increase the job timeout to 5 hours, we
>> do that in our periodic full tempest job.
> Sounds good, but I am afraid it could hurt more than helping, it could delay
> other things get fixed by lot
> especially if we got some extra flakiness, because of foobar.
> You cannot have all possible tripleo configs on the gate anyway,
> so something will pass which will require a quick fix.
> IMHO the only real solution, is making the before test-run steps faster or
> shorter.
> Do you have any option to start the tempest running jobs in a more developed
> state ?
> I mean, having more things already done at the start time  (images/snapshot)
> and just do a fast upgrade at the beginning of the job.
> Openstack installation can be completed in a `fast` way (~minute) on
> RHEL/Fedora systems
> after the yum steps, also if you are able to aggregate all yum step to
> single
> command execution (transaction) you generally able to save a lot of time.
> There is plenty of things what can be made more efficient before the test
> run,
> when you start considering everything evil which can be accounted for more
> than 30 sec
> of time, this can happen soon.
> For example just executing the cpython interpreter for the openstack
> commands is above 30 sec,
> the work what they are doing can be done in much much faster way.
> Lot of install steps actually does not depends on each other,
> it allows more things to be done in parallel, we generally can have more
> core than Ghz.
>>> > Should we focus on how much testing we can get into N time period?
>>> > Then how do we decide an optimal N
>>> > for our constraints?
>>> Well yeah, but that's pretty much how/why we ended up with pingtest, it's
>>> simple, fast, and provides an efficient way to do smoke tests, e.g
>>> creating
>>> just one heat resource is enough to prove multiple OpenStack services are
>>> running, as well as the DB/RPC etc etc.
>>> > I've been working on a full up functional test for OpenStack CI builds
>>> > for a long time now, it works but takes
>>> > more than 10 hours. IF you're interested in results kick through to
>>> > Kibana here [0]. Let me know off list if you
>>> > have any issues, the presentation of this data is all experimental
>>> > still.
>>> This kind of thing is great, and I'd support more exhaustive testing via
>>> periodic jobs etc, but the reality is we need to focus on "bang for buck"
>>> e.g the deepest possible coverage in the most minimal amount of time for
>>> our per-commit tests - we rely on the project gates to provide a full API
>>> surface test, and we need to focus on more basic things like "did the
>>> service
>>> start", and "is the API accessible".  Simple crud operations on a subset
>>> of
>>> the API's is totally fine for this IMO, whether via pingtest or some
>>> other
>>> means.
>> Right now we do have a periodic job running full tempest, with a few
>> skips, and because of the lack of tempest tests in the patches, it's being
>> pretty hard to keep it stable enough to have a 100% pass, and of course,
>> also the installation very often fails (like in the last five days).
>> For example, [1] is the latest run we have in periodic job that we get
>> results from tempest, and we have 114 failures that was caused by some new
>> code/change, and I have no idea which one was, just looking at the failures,
>> I can notice that smoke tests plus minimum basic scenario tests would catch
>> these failures and the developer could fix it and make me happy :)
>> Now I have to spend several hours installing and debugging each one of
>> those tests to identify where/why it fails.
>> Before this run, we got 100% pass, but unfortunately I don't have the
>> results anymore, it was removed already from logs.openstack.org
>>> Steve
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> [1]
>> http://logs.openstack.org/periodic/periodic-tripleo-ci-centos-7-ovb-nonha-tempest-oooq/0072651/logs/oooq/stackviz/#/stdin
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Emilien Macchi

More information about the OpenStack-dev mailing list