[openstack-dev] [qa][all] Branchless Tempest beyond pure-API tests, impact on backporting policy

Sean Dague sean at dague.net
Wed Jul 9 13:16:01 UTC 2014


I think we need to actually step back a little and figure out where we
are, how we got here, and what the future of validation might need to
look like in OpenStack. Because I think there has been some
communication gaps. (Also, for people I've had vigorous conversations
about this before, realize my positions have changed somewhat,
especially on separation of concerns.)

(Also note, this is all mental stream right now, so I will not pretend
that it's an entirely coherent view of the world, my hope in getting
things down is we can come up with that coherent view of the wold together.)

== Basic History ==

In the essex time frame Tempest was 70 tests. It was basically a barely
adequate sniff test for integration for OpenStack. So much so that our
first 3rd Party CI system, SmokeStack, used it's own test suite, which
legitimately found completely different bugs than Tempest. Not
surprising, Tempest was a really small number of integration tests.

As we got to Grizzly Tempest had grown to 1300 tests, somewhat
organically. People were throwing a mix of tests into the fold, some
using Tempest's client, some using official clients, some trying to hit
the database doing white box testing. It had become kind of a mess and a
rorshack test. We had some really weird design summit sessions because
many people had only looked at a piece of Tempest, and assumed the rest
was like it.

So we spent some time defining scope. Tempest couldn't really be
everything to everyone. It would be a few things:
 * API testing for public APIs with a contract
 * Some throughput integration scenarios to test some common flows
(these were expected to be small in number)
 * 3rd Party API testing (because it had existed previously)

But importantly, Tempest isn't a generic function test suite. Focus is
important, because Tempests mission always was highly aligned with what
eventually became called Defcore. Some way to validate some
compatibility between clouds. Be that clouds built from upstream (is the
cloud of 5 patches ago compatible with the cloud right now), clouds from
different vendors, public clouds vs. private clouds, etc.

== The Current Validation Environment ==

Today most OpenStack projects have 2 levels of validation. Unit tests &
Tempest. That's sort of like saying your house has a basement and a
roof. For sufficiently small values of house, this is fine. I don't
think our house is sufficiently small any more.

This has caused things like Neutron's unit tests, which actually bring
up a full wsgi functional stack and test plugins through http calls
through the entire wsgi stack, replicated 17 times. It's the reason that
Neutron unit tests takes many GB of memory to run, and often run longer
than Tempest runs. (Maru has been doing hero's work to fix much of this.)

In the last year we made it *really* easy to get a devstack node of your
own, configured any way you want, to do any project level validation you
like. Swift uses it to drive their own functional testing. Neutron is
working on heading down this path.

== New Challenges with New Projects ==

When we started down this path all projects had user APIs. So all
projects were something we could think about from a tenant usage
environment. Looking at both Ironic and Ceilometer, we really have
projects that are Admin API only.

== Contracts or lack thereof ==

I think this is where we start to overlap with Eoghan's thread most.
Because branchless Tempest assumes that the test in Tempest are governed
by a stable contract. The behavior should only change based on API
version, not on day of the week. In the case that triggered this what
was really being tested was not an API, but the existence of a meter
that only showed up in Juno.

Ceilometer is also another great instance of something that's often in a
state of huge amounts of stack tracing because it depends on some
internals interface in a project which isn't a contract. Or notification
formats, which aren't (largely) versioned.

Ironic has a Nova driver in their tree, which implements the Nova driver
internals interface. Which means they depend on something that's not a
contract. It gets broken a lot.

== Depth of reach of a test suite ==

Tempest can only reach so far into a stack given that it's levers are
basically public API calls. That's ok. But it means that things like
testing a bunch of different dbs in the gate (i.e. the postgresql job)
are pretty ineffectual. Trying to exercise code 4 levels deep through
API calls is like driving a rover on Mars. You can do it, but only very
carefully.

== Replication ==

Because there is such a huge gap between unit tests, and Tempest tests,
replication of issues is often challenging. We have the ability to see
races in the gate due to volume of results, that don't show up for
developers very easily. When you do 30k runs a week, a ton of data falls
out of it.

A good instance is the live snapshot bug. It was failing on about 3% of
Tempest runs, which means that it had about a 10% chance of killing a
patch on it's own. So it's definitely real. It's real enough that if we
enable that path, there are a ton of extra rechecks required by people.
However it's at a frequency that reproducing on demand is hard. And
reproducing with enough signal to make it debuggable is also hard.

== The Fail Pit ==

All of which has somewhat led us to the fail pit. Where keeping
OpenStack in a state that it can actually pass Tempest consistently is a
full time job. It's actually more than a full time job, it's a full time
program. If it was it's own program it would probably be larger than 1/2
the official programs in OpenStack.

Also, when the Gate "program" is understaffed, it means that all the
rest of the OpenStack programs (possibly excepting infra and tripleo
because they aren't in the integrated gate) are slowed down
dramatically. That velocity loss has real community and people power
implications.

This is especially true of people trying to get time, review, mentoring,
otherwise, out of the QA team. As there is kind of a natural overlap
with folks that actually want us to be able to merge code, so while the
Gate is under water, getting help on Tempest issues isn't going to
happen in any really responsive rate.

Also, all the folks that have been the work horses here, myself, joe
gordon, matt treinish, matt riedemann, are pretty burnt out on this.
Every time we seem to nail one issue, 3 more crop up. Having no ending
in sight and spending all your time shoveling out other project bugs is
not a happy place to be.

== New Thinking about our validation layers ==

I feel like an ideal world would be the following:

1. all projects have unit tests for their own internal testing, and
these pass 100% of the time (note, most projects have races in their
unit tests, and they don't pass 100% of the time. And they are low
priority to fix).
2. all projects have a functional devstack job with tests *in their own
tree* that pokes their project in interesting ways. This is akin to what
neutron is trying and what swift is doing. These are *not* cogating.
3. all non public API contracts are shored up by landing contract tests
in projects. We did this recently with Ironic in Nova -
https://github.com/openstack/nova/blob/master/nova/tests/virt/test_ironic_api_contracts.py.

4. all public API contracts are tested in Tempest (these are co-gating,
and ensure a contract breakage in keystone doesn't break swift).

Out of these 4 levels, we currently have 2 (1 and 4). In some projects
we're making #1 cover 1 & 2. And we're making #4 cover 4, 3, and
sometimes 2. And the problem with this is it's actually pretty wasteful,
and when things fail, they fail so far away from the test, that the
reproduce is hard.

I actually think that if we went down this path we could actually make
Tempest smaller. For instance, negative API testing is something I'd say
is really #2. While these tests don't take a ton of time, they do add a
certain amount of complexity. It might also mean that admin tests, whose
side effects are hard to understand sometimes without white/greybox
interactions might migrated into #2.

I also think that #3 would help expose much more surgically what the
cross project pain points are instead of proxy efforts through Tempest
for these subtle issues. Because Tempest is probably a terrible tool to
discover that notifications in nova changed. The results is some weird
failure in a ceilometer test which says some instance didn't run when it
was expected, then you have to dig through 5 different openstack logs to
figure out that it was really a deep exception somewhere. If it was
logged, which it often isn't. (I actually challenge anyone to figure out
the reason for a ceilometer failure from a Tempest test based on it's
current logging. :) )

And by ensuring specific functionality earlier in the stack, and letting
Nova beat up Nova the way they think they should in a functional test
(or land a Neutron functional test to ensure that it's doing the right
thing), would make the Tempests runs which were cogating, a ton more
predictable.

== Back to Branchless Tempest ==

I think the real issues that projects are running into with Branchless
Tempest is they are coming forward with tests not in class #4, which
fail, because while the same API existed 4 months ago as today, the
semantics of the project have changed in a non discoverable way. Which
I'd say was bad, however until we tried the radical idea of running the
API test suite against all releases that declared they had the same API,
we didn't see it. :)


Ok, that was a lot. Hopefully it was vaguely coherent. I want to preface
that I don't consider this all fully formed, but it's a lot of what's
been rattling around in my brain.

	-Sean

On 07/09/2014 05:41 AM, Eoghan Glynn wrote:
> 
> TL;DR: branchless Tempest shouldn't impact on backporting policy, yet
>        makes it difficult to test new features not discoverable via APIs
> 
> Folks,
> 
> At the project/release status meeting yesterday[1], I raised the issue
> that featureful backports to stable are beginning to show up[2], purely
> to facilitate branchless Tempest. We had a useful exchange of views on
> IRC but ran out of time, so this thread is intended to capture and
> complete the discussion.
> 
> The issues, as I see it, are:
> 
>  * Tempest is expected to do double-duty as both the integration testing
>    harness for upstream CI and as a tool for externally probing capabilities
>    in public clouds
> 
>  * Tempest has an implicit bent towards pure API tests, yet not all
>    interactions between OpenStack services that we want to test are
>    mediated by APIs
> 
>  * We don't have another integration test harness other than Tempest
>    that we could use to host tests that don't just make assertions
>    about the correctness/presence of versioned APIs
> 
>  * We want to be able to add new features to Juno, or fix bugs of
>    omission, in ways that aren't necessarily discoverable in the API;
>    without backporting these patches to stable if we wouldn't have
>    done so under the normal stable-maint policy[3]
> 
>  * Integrated projects are required[4] to provide Tempest coverage,
>    so the rate of addition of tests to Tempest is unlikely to slow
>    down anytime soon
> 
> So the specific type of test that I have in mind would be common
> for Ceilometer, but also possibly for Ironic and others:
> 
>  1. an end-user initiates some action via an API
>     (e.g. calls the cinder snapshot API)
> 
>  2. this initiates some actions behind the scenes
>     (e.g. a volume is snapshot'd and a notification emitted)
> 
>  3. the test reasons over some expected side-effect
>     (e.g. some metering data shows up in ceilometer)
> 
> The branchless Tempest spec envisages new features will be added and
> need to be skipped when testing stable/previous, but IIUC requires
> that the presence of new behaviors is externally discoverable[5].
> 
> One approach mooted for allowing these kind of scenarios to be tested
> was to split off the pure-API aspects of Tempest so that it can be used
> for probing public-cloud-capabilities as well as upstream CI, and then
> build project-specific mini-Tempests to test integration with other
> projects.
> 
> Personally, I'm not a fan of that approach as it would require a lot
> of QA expertise in each project, lead to inefficient use of CI
> nodepool resources to run all the mini-Tempests, and probably lead to
> a divergent hotchpotch of per-project approaches.
> 
> Another idea would be to keep all tests in Tempest, while also
> micro-versioning the services such that tests can be skipped on the
> basis of whether a particular feature-adding commit is present.
> 
> When this micro-versioning can't be discovered by the test (as in the
> public cloud capabilities probing case), those tests would be skipped
> anyway.
> 
> The final, less palatable, approach that occurs to me would be to
> revert to branchful Tempest.
> 
> Any other ideas, or preferences among the options laid out above? 
> 
> Cheers,
> Eoghan
> 
> [1] http://eavesdrop.openstack.org/meetings/project/2014/project.2014-07-08-21.03.html
> [2] https://review.openstack.org/104863
> [3] https://wiki.openstack.org/wiki/StableBranch#Appropriate_Fixes
> [4] https://github.com/openstack/governance/blob/master/reference/incubation-integration-requirements.rst#qa-1
> [5] https://github.com/openstack/qa-specs/blob/master/specs/implemented/branchless-tempest.rst#scenario-1-new-tests-for-new-features
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140709/5a385dd2/attachment.pgp>


More information about the OpenStack-dev mailing list