[qa][ptg][nova][cinder][keystone][neutron][glance][swift][placement] How to make integrated-gate testing (tempest-full) more stable and fast

Erno Kuvaja ekuvaja at redhat.com
Thu May 16 11:48:30 UTC 2019

On Tue, May 7, 2019 at 12:31 AM Tim Burke <tim at swiftstack.com> wrote:

> On 5/5/19 12:18 AM, Ghanshyam Mann wrote:
> Current integrated-gate jobs (tempest-full) is not so stable for various bugs specially timeout. We tried
> to improve it via filtering the slow tests in the separate tempest-slow job but the situation has not been improved much.
> We talked about the Ideas to make it more stable and fast for projects especially when failure is not
> related to each project. We are planning to split the integrated-gate template (only tempest-full job as
> first step) per related services.
> Idea:
> - Run only dependent service tests on project gate.
> I love this plan already.
> - Tempest gate will keep running all the services tests as the integrated gate at a centeralized  place without any change in the current job.
> - Each project can run the below mentioned template.
> - All below template will be defined and maintained by QA team.
> My biggest regret is that I couldn't figure out how to do this myself.
> Much thanks to the QA team!
> I would like to know each 6 services which run integrated-gate jobs
> 1."Integrated-gate-networking" (job to run on neutron gate)
>  Tests to run in this template: neutron APIs , nova APIs,  keystone APIs ? All scenario currently running in tempest-full in the same way ( means non-slow and in serial)
> Improvement for neutron gate: exlcude the cinder API tests,  glance API tests, swift API tests,
> 2."Integrated-gate-storage" (job to run on cinder gate, glance gate)
> Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs, Nova APIs and All scenario currently running in tempest-full in the same way ( means non-slow and in serial)
> Improvement for cinder, glance gate: excluded the neutron APIs tests, Keystone APIs tests
> 3. "Integrated-gate-object-storage" (job to run on swift gate)
> Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs and All scenario currently running in tempest-full in the same way ( means non-slow and in serial)
> Improvement for swift gate: excluded the neutron APIs tests, - Keystone APIs tests, - Nova APIs tests.
> This sounds great. My only question is why Cinder tests are still
> included, but I trust that it's there for a reason and I'm just revealing
> my own ignorance of Swift's consumers, however removed.
> Note: swift does not run integrated-gate as of now.
> Correct, and for all the reasons that you're seeking to address. Some
> eight months ago I'd gotten tired of seeing spurious failures that had
> nothing to do with Swift, and I was hard pressed to find an instance where
> the tempest tests caught a regression or behavior change that wasn't
> already caught by Swift's own functional tests. In short, the
> signal-to-noise ratio for those particular tests was low enough that a
> failure only told me "you should leave a recheck comment," so I proposed
> https://review.opendev.org/#/c/601813/ . There was also a side benefit of
> having our longest-running job change from legacy-tempest-dsvm-neutron-full
> (at 90-100 minutes) to swift-probetests-centos-7 (at ~30 minutes),
> tightening developer feedback loops.
> It sounds like this proposal addresses both concerns: by reducing the
> scope of tests to what might actually exercise the Swift API (if
> indirectly), the signal-to-noise ratio should be much better and the
> wall-clock time will be reduced.
> 4. "Integrated-gate-compute" (job to run on Nova gate)
> tests to run is : Nova APIs, Cinder APIs , Glance APIs ?, neutron APIs and All scenario currently running in tempest-full in same way ( means non-slow and in serial)
> Improvement for Nova gate: excluded the swift APIs tests(not running in current job but in future, it might), Keystone API tests.
> 5. "Integrated-gate-identity" (job to run on keystone gate)
> Tests to run is : all as all project use keystone, we might need to run all tests as it is running in integrated-gate.
> But does keystone is being unsed differently by all services? if no then, is it enough to run only single service tests say Nova or neutron ?
> 6. "Integrated-gate-placement" (job to run on placement gate)
> Tests to run in this template: Nova APIs tests, Neutron APIs tests + scenario tests + any new service depends on placement APIs
>  Improvement for placement gate: excluded the  glance APIs tests, cinder APIs tests, swift APIs tests, keystone APIs tests
> Thoughts on this approach?
> The important point is we must not lose the coverage of integrated testing per project. So I would like to
> get each project view if we are missing any dependency (proposed tests removal) in above proposed templates.
> As far as Swift is aware, these dependencies seem accurate; at any rate,
> *we* don't use anything other than Keystone, even by way of another API.
> Further, Swift does not use particularly esoteric Keysonte APIs; I would be
> OK with integrated-gate-identity not exercising Swift's API with the
> assumption that some other (or indeed, almost *any* other) service would
> likely exercise the parts that we care about.
> - https:/etherpad.openstack.org/p/qa-train-ptg
> -gmann
While I'm all up for limiting the scope Tempest is targeting for each patch
to save time and our precious infra resources I have feeling that we might
end up missing something here. Honestly I'm not sure what that something
would be and maybe it's me thinking the scopes wrong way around.

For example:

4. "Integrated-gate-compute" (job to run on Nova gate)

I'm not exactly sure what any given Nova patch would be able to break
from Cinder, Glance or Neutron or on number 2 what Swift is depending
on Glance and Cinder that we could break when we introduce a change.

Shouldn't we be looking "What projects are consuming service X and
target those Tempest tests"? In Glance perspective this would be (from
core projects) Glance, Cinder, Nova; Cinder probably interested about
Cinder, Glance and Nova (anyone else consuming Cinder?) etc.

I'd like to propose approach where we define these jobs and run them
in check for the start and let gate run full suites until we figure
out are we catching something in gate we did not catch in check and
once the understanding has been reached that we have sufficient
coverage, we can go ahead and swap gate using those jobs as well. This
approach would give us the benefit where the impact is highest until
we are confident we got the coverage right. I think biggest issue is
that for the transition period _everyone_ needs to understand that
gate might catch something check did not and simple "recheck" might
not be sufficient when tempest succeeded in check but failed in gate.


Erno "jokke_" Kuvaja
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190516/4dc07144/attachment-0001.html>

More information about the openstack-discuss mailing list