[openstack-dev] [qa][all] Branchless Tempest beyond pure-API tests, impact on backporting policy

Sean Dague sean at dague.net
Thu Jul 10 15:56:20 UTC 2014


On 07/10/2014 09:48 AM, Matthew Treinish wrote:
> On Wed, Jul 09, 2014 at 09:16:01AM -0400, Sean Dague wrote:
>> I think we need to actually step back a little and figure out where we
>> are, how we got here, and what the future of validation might need to
>> look like in OpenStack. Because I think there has been some
>> communication gaps. (Also, for people I've had vigorous conversations
>> about this before, realize my positions have changed somewhat,
>> especially on separation of concerns.)
>>
>> (Also note, this is all mental stream right now, so I will not pretend
>> that it's an entirely coherent view of the world, my hope in getting
>> things down is we can come up with that coherent view of the wold together.)
>>
>> == Basic History ==
>>
>> In the essex time frame Tempest was 70 tests. It was basically a barely
>> adequate sniff test for integration for OpenStack. So much so that our
>> first 3rd Party CI system, SmokeStack, used it's own test suite, which
>> legitimately found completely different bugs than Tempest. Not
>> surprising, Tempest was a really small number of integration tests.
>>
>> As we got to Grizzly Tempest had grown to 1300 tests, somewhat
>> organically. People were throwing a mix of tests into the fold, some
>> using Tempest's client, some using official clients, some trying to hit
>> the database doing white box testing. It had become kind of a mess and a
>> rorshack test. We had some really weird design summit sessions because
>> many people had only looked at a piece of Tempest, and assumed the rest
>> was like it.
>>
>> So we spent some time defining scope. Tempest couldn't really be
>> everything to everyone. It would be a few things:
>>  * API testing for public APIs with a contract
>>  * Some throughput integration scenarios to test some common flows
>> (these were expected to be small in number)
>>  * 3rd Party API testing (because it had existed previously)
>>
>> But importantly, Tempest isn't a generic function test suite. Focus is
>> important, because Tempests mission always was highly aligned with what
>> eventually became called Defcore. Some way to validate some
>> compatibility between clouds. Be that clouds built from upstream (is the
>> cloud of 5 patches ago compatible with the cloud right now), clouds from
>> different vendors, public clouds vs. private clouds, etc.
>>
>> == The Current Validation Environment ==
>>
>> Today most OpenStack projects have 2 levels of validation. Unit tests &
>> Tempest. That's sort of like saying your house has a basement and a
>> roof. For sufficiently small values of house, this is fine. I don't
>> think our house is sufficiently small any more.
>>
>> This has caused things like Neutron's unit tests, which actually bring
>> up a full wsgi functional stack and test plugins through http calls
>> through the entire wsgi stack, replicated 17 times. It's the reason that
>> Neutron unit tests takes many GB of memory to run, and often run longer
>> than Tempest runs. (Maru has been doing hero's work to fix much of this.)
>>
>> In the last year we made it *really* easy to get a devstack node of your
>> own, configured any way you want, to do any project level validation you
>> like. Swift uses it to drive their own functional testing. Neutron is
>> working on heading down this path.
>>
>> == New Challenges with New Projects ==
>>
>> When we started down this path all projects had user APIs. So all
>> projects were something we could think about from a tenant usage
>> environment. Looking at both Ironic and Ceilometer, we really have
>> projects that are Admin API only.
>>
>> == Contracts or lack thereof ==
>>
>> I think this is where we start to overlap with Eoghan's thread most.
>> Because branchless Tempest assumes that the test in Tempest are governed
>> by a stable contract. The behavior should only change based on API
>> version, not on day of the week. In the case that triggered this what
>> was really being tested was not an API, but the existence of a meter
>> that only showed up in Juno.
>>
>> Ceilometer is also another great instance of something that's often in a
>> state of huge amounts of stack tracing because it depends on some
>> internals interface in a project which isn't a contract. Or notification
>> formats, which aren't (largely) versioned.
>>
>> Ironic has a Nova driver in their tree, which implements the Nova driver
>> internals interface. Which means they depend on something that's not a
>> contract. It gets broken a lot.
>>
>> == Depth of reach of a test suite ==
>>
>> Tempest can only reach so far into a stack given that it's levers are
>> basically public API calls. That's ok. But it means that things like
>> testing a bunch of different dbs in the gate (i.e. the postgresql job)
>> are pretty ineffectual. Trying to exercise code 4 levels deep through
>> API calls is like driving a rover on Mars. You can do it, but only very
>> carefully.
>>
>> == Replication ==
>>
>> Because there is such a huge gap between unit tests, and Tempest tests,
>> replication of issues is often challenging. We have the ability to see
>> races in the gate due to volume of results, that don't show up for
>> developers very easily. When you do 30k runs a week, a ton of data falls
>> out of it.
>>
>> A good instance is the live snapshot bug. It was failing on about 3% of
>> Tempest runs, which means that it had about a 10% chance of killing a
>> patch on it's own. So it's definitely real. It's real enough that if we
>> enable that path, there are a ton of extra rechecks required by people.
>> However it's at a frequency that reproducing on demand is hard. And
>> reproducing with enough signal to make it debuggable is also hard.
>>
>> == The Fail Pit ==
>>
>> All of which has somewhat led us to the fail pit. Where keeping
>> OpenStack in a state that it can actually pass Tempest consistently is a
>> full time job. It's actually more than a full time job, it's a full time
>> program. If it was it's own program it would probably be larger than 1/2
>> the official programs in OpenStack.
>>
>> Also, when the Gate "program" is understaffed, it means that all the
>> rest of the OpenStack programs (possibly excepting infra and tripleo
>> because they aren't in the integrated gate) are slowed down
>> dramatically. That velocity loss has real community and people power
>> implications.
>>
>> This is especially true of people trying to get time, review, mentoring,
>> otherwise, out of the QA team. As there is kind of a natural overlap
>> with folks that actually want us to be able to merge code, so while the
>> Gate is under water, getting help on Tempest issues isn't going to
>> happen in any really responsive rate.
>>
>> Also, all the folks that have been the work horses here, myself, joe
>> gordon, matt treinish, matt riedemann, are pretty burnt out on this.
>> Every time we seem to nail one issue, 3 more crop up. Having no ending
>> in sight and spending all your time shoveling out other project bugs is
>> not a happy place to be.
>>
>> == New Thinking about our validation layers ==
>>
>> I feel like an ideal world would be the following:
>>
>> 1. all projects have unit tests for their own internal testing, and
>> these pass 100% of the time (note, most projects have races in their
>> unit tests, and they don't pass 100% of the time. And they are low
>> priority to fix).
>> 2. all projects have a functional devstack job with tests *in their own
>> tree* that pokes their project in interesting ways. This is akin to what
>> neutron is trying and what swift is doing. These are *not* cogating.
> 
> So I'm not sure that this should be a mandatory thing, but an opt-in. My real
> concern is the manpower, who is going to take the time to write all the test
> suites for all of the projects. I think it would be better to add that on-demand
> as the extra testing is required. That being said, I definitely view doing this
> as a good thing and something to be encouraged, because tempest won't be able to
> test everything. 
> 
> The other thing to also consider is duplicated effort between projects. For an
> example, look at the CLI tests in Tempest, the functional testing framework for
> testing CLI formatting was essentially the same between all the clients which is
> why they're in tempest. Under your proposal here, CLI tests should be moved back
> to the clients. But, would that mean we have a bunch of copy and pasted versions
> of the CLI test framework between all the project. 
> 
> I really want to avoid a situation where every project does the same basic
> testing differently just in a rush to spin up functional testing. I think coming
> up with a solution for a place with common test patterns and frameworks that can
> be maintained independently of all the projects and consumed for project
> specific testing is something we should figure out first. (I'm not sure oslo
> would be the right place for this necessarily)

It would be simple enough to have a test framework library for that.
Realistically that could even co-gate with these tests. I think copy /
paste is completely solvable here. It would effectively be back door
libification of Tempest.

I don't think the manpower problem is being well solved in the current
model. And I think the difficulty in debugging failures because we are
missing these lower levels of testing which verify behavior in a more
contained situation. Which I think impacts people being excited to work
on this, or the level of skill needed to help.

>> 3. all non public API contracts are shored up by landing contract tests
>> in projects. We did this recently with Ironic in Nova -
>> https://github.com/openstack/nova/blob/master/nova/tests/virt/test_ironic_api_contracts.py.
> 
> So I think that the contract unit tests work well specifically for the ironic
> use case, but isn't a general solution. Mostly because the Nova driver api is an
> unstable interface and there is no reason for that to change. It's also a
> temporary thing because eventually the driver will be moved into Nova and then
> the only cross-project interaction between Ironic and Nova will be over the
> stable REST APIs.
> 
> I think in general we should try to avoid doing non REST API cross-project
> communication. So hopefully there won't be more of these class of things, and
> if there are we can tackle them on a per case basis. But, even if it's a non
> REST API I don't think we should ever encourage or really allow any
> cross-project interactions over unstable interfaces.
> 
> As a solution for notifications I'd rather see a separate notification
> white/grey (or any other monochrome shade) box test suite. If as a project we
> say that notifications have to be versioned for any change we can then enforce
> that easily with an external test suite that contains the definitions for all
> the notification. It then just makes a bunch of api calls and sits on RPC
> verifying the notification format. (or something of that ilk)
> 
> I agree that normally whitebox testing needs to be tightly coupled with the data
> models in the projects, but I feel like notifications are slightly different.
> Mostly, because the basic format is the same between all the projects to make
> consumption simpler. So instead of duplicating the work to validate the
> notifications in all the projects it would be better to just implement it once.
> I also think tempest being an external audit on the API has been invaluable so
> enforcing that for notifications would have similar benefits.
> 
> As an aside I think it would probably be fair if this was maintained as part of
> ceilometer or the telemetry program, since that's really all notifications are
> used for. (or least as AIUI) But, it would still be a co-gating test suite for
> anything that emits notifications. 
> 
>>
>> 4. all public API contracts are tested in Tempest (these are co-gating,
>> and ensure a contract breakage in keystone doesn't break swift).
>>
>> Out of these 4 levels, we currently have 2 (1 and 4). In some projects
>> we're making #1 cover 1 & 2. And we're making #4 cover 4, 3, and
>> sometimes 2. And the problem with this is it's actually pretty wasteful,
>> and when things fail, they fail so far away from the test, that the
>> reproduce is hard.
> 
> I think the only real issue in your proposal is that the boundaries between all
> the test classifications aren't as well defined as they seem. I agree that
> having more intermediate classes of testing is definitely a good thing to do.
> Especially, since there is a great deal of hand waving on the details between
> what is being run in between tempest and unit tests. But, the issue as I see it
> is without guidelines on what type of tests belong where we'll end up with a
> bunch duplicated work. 
> 
> It's the same problem we have all the time in tempest, where we get a lot of
> patches that exceed the scope of tempest, despite it being arguably clearly
> outlined in the developer docs. But, the complexity is higher in this situation,
> because of having a bunch of different types of test suites that are available
> to add a new test to. I just think before we adopt #2 as mandatory it's
> important to have a better definition on the scope of the project specific
> functional testing.
> 
>>
>> I actually think that if we went down this path we could actually make
>> Tempest smaller. For instance, negative API testing is something I'd say
>> is really #2. While these tests don't take a ton of time, they do add a
>> certain amount of complexity. It might also mean that admin tests, whose
>> side effects are hard to understand sometimes without white/greybox
>> interactions might migrated into #2.
> 
> I think that negative testing is still part of tempest in your proposal. I still
> feel that the negative space of an API still is part of the contract, and should
> be externally validated. As part of tempest I think we need to revisit the
> negative space solution again, because I haven't seen much growth on the
> automatic test generation. We also can probably be way more targeted about what
> we're running, but I don't think punting on negative testing in tempest is
> something we should do.
> 
> I actually think that testing on the admin api is doubly important because of
> the inadvertent side effects that they can cause. I think attempting to map that
> out is useful. (I don't think we can assume that the admin api is being used in
> a vacuum) I agree that in your proposal that tests for those weird interactions
> might be more fitting for #2. (to avoid more heisenbugs in tempest, etc) But,
> I'm on the fence about that. Mostly because I still think an admin api should
> conform to the api guidelines and thus needs tempest tests for that. I know
> you've expressed the opposite opinion about stability on the admin apis. But, I
> fail to see the distinction between an admin api and any other api when it comes
> to the stable api guarantees.
> 
> For a real world example look at the default-quotas api which was probably the
> most recent example of this. (and I suspect why you mentioned it here :) ) The
> reason the test was added was because it was previously removed from nova, while
> horizon depended on it, which is exactly the kind of thing we should be using
> tempest for. (even under your proposal since it's a co-gating rest api issue)
> What's better about this example is that the test added had all the harms you
> outlined about a weird cross interactions from this extension and the other
> tests. I think when we weigh the complexity against the benefits for testing
> admin-apis in tempest there isn't a compelling reason to pull them out of
> tempest. But, as an alternative should start attempting to get clever about
> scheduling tests to avoid some strange cross interactions.
> 
>>
>> I also think that #3 would help expose much more surgically what the
>> cross project pain points are instead of proxy efforts through Tempest
>> for these subtle issues. Because Tempest is probably a terrible tool to
>> discover that notifications in nova changed. The results is some weird
>> failure in a ceilometer test which says some instance didn't run when it
>> was expected, then you have to dig through 5 different openstack logs to
>> figure out that it was really a deep exception somewhere. If it was
>> logged, which it often isn't. (I actually challenge anyone to figure out
>> the reason for a ceilometer failure from a Tempest test based on it's
>> current logging. :) )
> 
> I agree that we should be directly testing the cross-project integration points
> which aren't REST APIs. I just don't think that we should decrease the level of
> api testing in tempest for something that consumes that integration point. I
> just feel if we ignore the top level too much we're going to expose more api
> bugs. What I think the real path forward is validating both separately.
> Hopefully, that'll let us catch bugs at each level independently.
> 
>>
>> And by ensuring specific functionality earlier in the stack, and letting
>> Nova beat up Nova the way they think they should in a functional test
>> (or land a Neutron functional test to ensure that it's doing the right
>> thing), would make the Tempests runs which were cogating, a ton more
>> predictable.
>>
>> == Back to Branchless Tempest ==
>>
>> I think the real issues that projects are running into with Branchless
>> Tempest is they are coming forward with tests not in class #4, which
>> fail, because while the same API existed 4 months ago as today, the
>> semantics of the project have changed in a non discoverable way. Which
>> I'd say was bad, however until we tried the radical idea of running the
>> API test suite against all releases that declared they had the same API,
>> we didn't see it. :)
>>
>>
>> Ok, that was a lot. Hopefully it was vaguely coherent. I want to preface
>> that I don't consider this all fully formed, but it's a lot of what's
>> been rattling around in my brain.
>>
> 
> So here are some of my initial thoughts, I still need to stew some more on some
> of the details, so certain things may be more of a knee-jerk reaction and I
> might still be missing certain intricacies. Also a part of my response here is
> just me playing devil's advocate. I definitely think more testing is always
> better. I just want to make sure we're targeting the right things, because this
> proposal is pushing for a lot extra work for everyone. I want to make sure that
> before we commit to something this large that it's the right direction.

A big part of the current challenge is that what we are trying to answer
is the following:

"Does the proposed commit do what it believes it does, and does it avoid
regressions of behavior we believe should be."

Right now our system (all parts partially to blame) is doing a very poor
job of answering that question. Because we're now incorrectly answering
that question more often than doing so correctly. Which causes other
issues because people are recheck grinding patches through that
*actually* cause bugs, but with Jenkins crying wolf so often people lose
patience and don't care.

	-Sean

-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140710/18e43a40/attachment-0001.pgp>


More information about the OpenStack-dev mailing list