[openstack-dev] [qa][all] Branchless Tempest beyond pure-API tests, impact on backporting policy

Doug Hellmann doug.hellmann at dreamhost.com
Thu Jul 10 18:39:38 UTC 2014


On Thu, Jul 10, 2014 at 11:56 AM, Sean Dague <sean at dague.net> wrote:
> On 07/10/2014 09:48 AM, Matthew Treinish wrote:
>> On Wed, Jul 09, 2014 at 09:16:01AM -0400, Sean Dague wrote:
>>> I think we need to actually step back a little and figure out where we
>>> are, how we got here, and what the future of validation might need to
>>> look like in OpenStack. Because I think there has been some
>>> communication gaps. (Also, for people I've had vigorous conversations
>>> about this before, realize my positions have changed somewhat,
>>> especially on separation of concerns.)
>>>
>>> (Also note, this is all mental stream right now, so I will not pretend
>>> that it's an entirely coherent view of the world, my hope in getting
>>> things down is we can come up with that coherent view of the wold together.)
>>>
>>> == Basic History ==
>>>
>>> In the essex time frame Tempest was 70 tests. It was basically a barely
>>> adequate sniff test for integration for OpenStack. So much so that our
>>> first 3rd Party CI system, SmokeStack, used it's own test suite, which
>>> legitimately found completely different bugs than Tempest. Not
>>> surprising, Tempest was a really small number of integration tests.
>>>
>>> As we got to Grizzly Tempest had grown to 1300 tests, somewhat
>>> organically. People were throwing a mix of tests into the fold, some
>>> using Tempest's client, some using official clients, some trying to hit
>>> the database doing white box testing. It had become kind of a mess and a
>>> rorshack test. We had some really weird design summit sessions because
>>> many people had only looked at a piece of Tempest, and assumed the rest
>>> was like it.
>>>
>>> So we spent some time defining scope. Tempest couldn't really be
>>> everything to everyone. It would be a few things:
>>>  * API testing for public APIs with a contract
>>>  * Some throughput integration scenarios to test some common flows
>>> (these were expected to be small in number)
>>>  * 3rd Party API testing (because it had existed previously)
>>>
>>> But importantly, Tempest isn't a generic function test suite. Focus is
>>> important, because Tempests mission always was highly aligned with what
>>> eventually became called Defcore. Some way to validate some
>>> compatibility between clouds. Be that clouds built from upstream (is the
>>> cloud of 5 patches ago compatible with the cloud right now), clouds from
>>> different vendors, public clouds vs. private clouds, etc.
>>>
>>> == The Current Validation Environment ==
>>>
>>> Today most OpenStack projects have 2 levels of validation. Unit tests &
>>> Tempest. That's sort of like saying your house has a basement and a
>>> roof. For sufficiently small values of house, this is fine. I don't
>>> think our house is sufficiently small any more.
>>>
>>> This has caused things like Neutron's unit tests, which actually bring
>>> up a full wsgi functional stack and test plugins through http calls
>>> through the entire wsgi stack, replicated 17 times. It's the reason that
>>> Neutron unit tests takes many GB of memory to run, and often run longer
>>> than Tempest runs. (Maru has been doing hero's work to fix much of this.)
>>>
>>> In the last year we made it *really* easy to get a devstack node of your
>>> own, configured any way you want, to do any project level validation you
>>> like. Swift uses it to drive their own functional testing. Neutron is
>>> working on heading down this path.
>>>
>>> == New Challenges with New Projects ==
>>>
>>> When we started down this path all projects had user APIs. So all
>>> projects were something we could think about from a tenant usage
>>> environment. Looking at both Ironic and Ceilometer, we really have
>>> projects that are Admin API only.
>>>
>>> == Contracts or lack thereof ==
>>>
>>> I think this is where we start to overlap with Eoghan's thread most.
>>> Because branchless Tempest assumes that the test in Tempest are governed
>>> by a stable contract. The behavior should only change based on API
>>> version, not on day of the week. In the case that triggered this what
>>> was really being tested was not an API, but the existence of a meter
>>> that only showed up in Juno.
>>>
>>> Ceilometer is also another great instance of something that's often in a
>>> state of huge amounts of stack tracing because it depends on some
>>> internals interface in a project which isn't a contract. Or notification
>>> formats, which aren't (largely) versioned.
>>>
>>> Ironic has a Nova driver in their tree, which implements the Nova driver
>>> internals interface. Which means they depend on something that's not a
>>> contract. It gets broken a lot.
>>>
>>> == Depth of reach of a test suite ==
>>>
>>> Tempest can only reach so far into a stack given that it's levers are
>>> basically public API calls. That's ok. But it means that things like
>>> testing a bunch of different dbs in the gate (i.e. the postgresql job)
>>> are pretty ineffectual. Trying to exercise code 4 levels deep through
>>> API calls is like driving a rover on Mars. You can do it, but only very
>>> carefully.
>>>
>>> == Replication ==
>>>
>>> Because there is such a huge gap between unit tests, and Tempest tests,
>>> replication of issues is often challenging. We have the ability to see
>>> races in the gate due to volume of results, that don't show up for
>>> developers very easily. When you do 30k runs a week, a ton of data falls
>>> out of it.
>>>
>>> A good instance is the live snapshot bug. It was failing on about 3% of
>>> Tempest runs, which means that it had about a 10% chance of killing a
>>> patch on it's own. So it's definitely real. It's real enough that if we
>>> enable that path, there are a ton of extra rechecks required by people.
>>> However it's at a frequency that reproducing on demand is hard. And
>>> reproducing with enough signal to make it debuggable is also hard.
>>>
>>> == The Fail Pit ==
>>>
>>> All of which has somewhat led us to the fail pit. Where keeping
>>> OpenStack in a state that it can actually pass Tempest consistently is a
>>> full time job. It's actually more than a full time job, it's a full time
>>> program. If it was it's own program it would probably be larger than 1/2
>>> the official programs in OpenStack.
>>>
>>> Also, when the Gate "program" is understaffed, it means that all the
>>> rest of the OpenStack programs (possibly excepting infra and tripleo
>>> because they aren't in the integrated gate) are slowed down
>>> dramatically. That velocity loss has real community and people power
>>> implications.
>>>
>>> This is especially true of people trying to get time, review, mentoring,
>>> otherwise, out of the QA team. As there is kind of a natural overlap
>>> with folks that actually want us to be able to merge code, so while the
>>> Gate is under water, getting help on Tempest issues isn't going to
>>> happen in any really responsive rate.
>>>
>>> Also, all the folks that have been the work horses here, myself, joe
>>> gordon, matt treinish, matt riedemann, are pretty burnt out on this.
>>> Every time we seem to nail one issue, 3 more crop up. Having no ending
>>> in sight and spending all your time shoveling out other project bugs is
>>> not a happy place to be.
>>>
>>> == New Thinking about our validation layers ==
>>>
>>> I feel like an ideal world would be the following:
>>>
>>> 1. all projects have unit tests for their own internal testing, and
>>> these pass 100% of the time (note, most projects have races in their
>>> unit tests, and they don't pass 100% of the time. And they are low
>>> priority to fix).
>>> 2. all projects have a functional devstack job with tests *in their own
>>> tree* that pokes their project in interesting ways. This is akin to what
>>> neutron is trying and what swift is doing. These are *not* cogating.
>>
>> So I'm not sure that this should be a mandatory thing, but an opt-in. My real
>> concern is the manpower, who is going to take the time to write all the test
>> suites for all of the projects. I think it would be better to add that on-demand
>> as the extra testing is required. That being said, I definitely view doing this
>> as a good thing and something to be encouraged, because tempest won't be able to
>> test everything.
>>
>> The other thing to also consider is duplicated effort between projects. For an
>> example, look at the CLI tests in Tempest, the functional testing framework for
>> testing CLI formatting was essentially the same between all the clients which is
>> why they're in tempest. Under your proposal here, CLI tests should be moved back
>> to the clients. But, would that mean we have a bunch of copy and pasted versions
>> of the CLI test framework between all the project.
>>
>> I really want to avoid a situation where every project does the same basic
>> testing differently just in a rush to spin up functional testing. I think coming
>> up with a solution for a place with common test patterns and frameworks that can
>> be maintained independently of all the projects and consumed for project
>> specific testing is something we should figure out first. (I'm not sure oslo
>> would be the right place for this necessarily)
>
> It would be simple enough to have a test framework library for that.
> Realistically that could even co-gate with these tests. I think copy /
> paste is completely solvable here. It would effectively be back door
> libification of Tempest.

We could even add things to oslotest, the test framework library we
already have. :-)

> I don't think the manpower problem is being well solved in the current
> model. And I think the difficulty in debugging failures because we are
> missing these lower levels of testing which verify behavior in a more
> contained situation. Which I think impacts people being excited to work
> on this, or the level of skill needed to help.
>
>>> 3. all non public API contracts are shored up by landing contract tests
>>> in projects. We did this recently with Ironic in Nova -
>>> https://github.com/openstack/nova/blob/master/nova/tests/virt/test_ironic_api_contracts.py.
>>
>> So I think that the contract unit tests work well specifically for the ironic
>> use case, but isn't a general solution. Mostly because the Nova driver api is an
>> unstable interface and there is no reason for that to change. It's also a
>> temporary thing because eventually the driver will be moved into Nova and then
>> the only cross-project interaction between Ironic and Nova will be over the
>> stable REST APIs.
>>
>> I think in general we should try to avoid doing non REST API cross-project
>> communication. So hopefully there won't be more of these class of things, and
>> if there are we can tackle them on a per case basis. But, even if it's a non
>> REST API I don't think we should ever encourage or really allow any
>> cross-project interactions over unstable interfaces.
>>
>> As a solution for notifications I'd rather see a separate notification
>> white/grey (or any other monochrome shade) box test suite. If as a project we
>> say that notifications have to be versioned for any change we can then enforce
>> that easily with an external test suite that contains the definitions for all
>> the notification. It then just makes a bunch of api calls and sits on RPC
>> verifying the notification format. (or something of that ilk)
>>
>> I agree that normally whitebox testing needs to be tightly coupled with the data
>> models in the projects, but I feel like notifications are slightly different.
>> Mostly, because the basic format is the same between all the projects to make
>> consumption simpler. So instead of duplicating the work to validate the
>> notifications in all the projects it would be better to just implement it once.
>> I also think tempest being an external audit on the API has been invaluable so
>> enforcing that for notifications would have similar benefits.
>>
>> As an aside I think it would probably be fair if this was maintained as part of
>> ceilometer or the telemetry program, since that's really all notifications are
>> used for. (or least as AIUI) But, it would still be a co-gating test suite for
>> anything that emits notifications.
>>
>>>
>>> 4. all public API contracts are tested in Tempest (these are co-gating,
>>> and ensure a contract breakage in keystone doesn't break swift).
>>>
>>> Out of these 4 levels, we currently have 2 (1 and 4). In some projects
>>> we're making #1 cover 1 & 2. And we're making #4 cover 4, 3, and
>>> sometimes 2. And the problem with this is it's actually pretty wasteful,
>>> and when things fail, they fail so far away from the test, that the
>>> reproduce is hard.
>>
>> I think the only real issue in your proposal is that the boundaries between all
>> the test classifications aren't as well defined as they seem. I agree that
>> having more intermediate classes of testing is definitely a good thing to do.
>> Especially, since there is a great deal of hand waving on the details between
>> what is being run in between tempest and unit tests. But, the issue as I see it
>> is without guidelines on what type of tests belong where we'll end up with a
>> bunch duplicated work.
>>
>> It's the same problem we have all the time in tempest, where we get a lot of
>> patches that exceed the scope of tempest, despite it being arguably clearly
>> outlined in the developer docs. But, the complexity is higher in this situation,
>> because of having a bunch of different types of test suites that are available
>> to add a new test to. I just think before we adopt #2 as mandatory it's
>> important to have a better definition on the scope of the project specific
>> functional testing.
>>
>>>
>>> I actually think that if we went down this path we could actually make
>>> Tempest smaller. For instance, negative API testing is something I'd say
>>> is really #2. While these tests don't take a ton of time, they do add a
>>> certain amount of complexity. It might also mean that admin tests, whose
>>> side effects are hard to understand sometimes without white/greybox
>>> interactions might migrated into #2.
>>
>> I think that negative testing is still part of tempest in your proposal. I still
>> feel that the negative space of an API still is part of the contract, and should
>> be externally validated. As part of tempest I think we need to revisit the
>> negative space solution again, because I haven't seen much growth on the
>> automatic test generation. We also can probably be way more targeted about what
>> we're running, but I don't think punting on negative testing in tempest is
>> something we should do.
>>
>> I actually think that testing on the admin api is doubly important because of
>> the inadvertent side effects that they can cause. I think attempting to map that
>> out is useful. (I don't think we can assume that the admin api is being used in
>> a vacuum) I agree that in your proposal that tests for those weird interactions
>> might be more fitting for #2. (to avoid more heisenbugs in tempest, etc) But,
>> I'm on the fence about that. Mostly because I still think an admin api should
>> conform to the api guidelines and thus needs tempest tests for that. I know
>> you've expressed the opposite opinion about stability on the admin apis. But, I
>> fail to see the distinction between an admin api and any other api when it comes
>> to the stable api guarantees.
>>
>> For a real world example look at the default-quotas api which was probably the
>> most recent example of this. (and I suspect why you mentioned it here :) ) The
>> reason the test was added was because it was previously removed from nova, while
>> horizon depended on it, which is exactly the kind of thing we should be using
>> tempest for. (even under your proposal since it's a co-gating rest api issue)
>> What's better about this example is that the test added had all the harms you
>> outlined about a weird cross interactions from this extension and the other
>> tests. I think when we weigh the complexity against the benefits for testing
>> admin-apis in tempest there isn't a compelling reason to pull them out of
>> tempest. But, as an alternative should start attempting to get clever about
>> scheduling tests to avoid some strange cross interactions.
>>
>>>
>>> I also think that #3 would help expose much more surgically what the
>>> cross project pain points are instead of proxy efforts through Tempest
>>> for these subtle issues. Because Tempest is probably a terrible tool to
>>> discover that notifications in nova changed. The results is some weird
>>> failure in a ceilometer test which says some instance didn't run when it
>>> was expected, then you have to dig through 5 different openstack logs to
>>> figure out that it was really a deep exception somewhere. If it was
>>> logged, which it often isn't. (I actually challenge anyone to figure out
>>> the reason for a ceilometer failure from a Tempest test based on it's
>>> current logging. :) )
>>
>> I agree that we should be directly testing the cross-project integration points
>> which aren't REST APIs. I just don't think that we should decrease the level of
>> api testing in tempest for something that consumes that integration point. I
>> just feel if we ignore the top level too much we're going to expose more api
>> bugs. What I think the real path forward is validating both separately.
>> Hopefully, that'll let us catch bugs at each level independently.
>>
>>>
>>> And by ensuring specific functionality earlier in the stack, and letting
>>> Nova beat up Nova the way they think they should in a functional test
>>> (or land a Neutron functional test to ensure that it's doing the right
>>> thing), would make the Tempests runs which were cogating, a ton more
>>> predictable.
>>>
>>> == Back to Branchless Tempest ==
>>>
>>> I think the real issues that projects are running into with Branchless
>>> Tempest is they are coming forward with tests not in class #4, which
>>> fail, because while the same API existed 4 months ago as today, the
>>> semantics of the project have changed in a non discoverable way. Which
>>> I'd say was bad, however until we tried the radical idea of running the
>>> API test suite against all releases that declared they had the same API,
>>> we didn't see it. :)
>>>
>>>
>>> Ok, that was a lot. Hopefully it was vaguely coherent. I want to preface
>>> that I don't consider this all fully formed, but it's a lot of what's
>>> been rattling around in my brain.
>>>
>>
>> So here are some of my initial thoughts, I still need to stew some more on some
>> of the details, so certain things may be more of a knee-jerk reaction and I
>> might still be missing certain intricacies. Also a part of my response here is
>> just me playing devil's advocate. I definitely think more testing is always
>> better. I just want to make sure we're targeting the right things, because this
>> proposal is pushing for a lot extra work for everyone. I want to make sure that
>> before we commit to something this large that it's the right direction.
>
> A big part of the current challenge is that what we are trying to answer
> is the following:
>
> "Does the proposed commit do what it believes it does, and does it avoid
> regressions of behavior we believe should be."
>
> Right now our system (all parts partially to blame) is doing a very poor
> job of answering that question. Because we're now incorrectly answering
> that question more often than doing so correctly. Which causes other
> issues because people are recheck grinding patches through that
> *actually* cause bugs, but with Jenkins crying wolf so often people lose
> patience and don't care.
>
>         -Sean
>
> --
> Sean Dague
> http://dague.net
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list