[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

Matthew Treinish mtreinish at kortar.org
Tue Jun 14 21:12:18 UTC 2016


On Tue, Jun 14, 2016 at 12:19:54PM -0700, Chris Hoge wrote:
> 
> > On Jun 14, 2016, at 11:21 AM, Matthew Treinish <mtreinish at kortar.org> wrote:
> > 
> > On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> >> Last year, in response to Nova micro-versioning and extension updates[1],
> >> the QA team added strict API schema checking to Tempest to ensure that
> >> no additional properties were added to Nova API responses[2][3]. In the
> >> last year, at least three vendors participating the the OpenStack Powered
> >> Trademark program have been impacted by this change, two of which
> >> reported this to the DefCore Working Group mailing list earlier this year[4].
> >> 
> >> The DefCore Working Group determines guidelines for the OpenStack Powered
> >> program, which includes capabilities with associated functional tests
> >> from Tempest that must be passed, and designated sections with associated
> >> upstream code [5][6]. In determining these guidelines, the working group
> >> attempts to balance the future direction of development with lagging
> >> indicators of deployments and user adoption.
> >> 
> >> After a tremendous amount of consideration, I believe that the DefCore
> >> Working Group needs to implement a temporary waiver for the strict API
> >> checking requirements that were introduced last year, to give downstream
> >> deployers more time to catch up with the strict micro-versioning
> >> requirements determined by the Nova/Compute team and enforced by the
> >> Tempest/QA team.
> > 
> > I'm very much opposed to this being done. If we're actually concerned with
> > interoperability and verify that things behave in the same manner between multiple
> > clouds then doing this would be a big step backwards. The fundamental disconnect
> > here is that the vendors who have implemented out of band extensions or were
> > taking advantage of previously available places to inject extra attributes
> > believe that doing so means they're interoperable, which is quite far from
> > reality. **The API is not a place for vendor differentiation.**
> 
> Yes, it’s bad practice, but it’s also a reality, and I honestly believe that
> vendors have received the message and are working on changing.

They might be working on this, but this change was coming for quite some
time it shouldn't be a surprise to anyone at this point. I mean seriously, it's
been in tempest for 1 year, and it took 6months to land. Also, lets say we set
a hard deadline on this new option to disable the enforcement and enforce it.
Then we implement a similar change on keystone are we gonna have to do the same
thing again when vendors who have custom things running there fail.

> 
> > As a user of several clouds myself I can say that having random gorp in a
> > response makes it much more difficult to use my code against multiple clouds. I
> > have to determine which properties being returned are specific to that vendor's
> > cloud and if I actually need to depend on them for anything it makes whatever
> > code I'm writing incompatible for using against any other cloud. (unless I
> > special case that block for each cloud) Sean Dague wrote a good post where a lot
> > of this was covered a year ago when microversions was starting to pick up steam:
> > 
> > https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2 <https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2>
> > 
> > I'd recommend giving it a read, he explains the user first perspective more
> > clearly there.
> > 
> > I believe Tempest in this case is doing the right thing from an interoperability
> > perspective and ensuring that the API is actually the API. Not an API with extra
> > bits a vendor decided to add.
> 
> A few points on this, though. Right now, Nova is the only API that is
> enforcing this, and the clients. While this may change in the
> future, I don’t think it accurately represents the reality of what’s
> happening in the ecosystem.

This in itself doesn't make a difference. There is a disparity in the level of
testing across all the projects. Nova happens to be further along in regards
to api stability and testing things compared to a lot of projects, it's not
really a surprise that they're the first for this to come up on. It's only a
matter of time for other projects to follow nova's example and implement similar
enforcement.

> 
> As mentioned before, we also need to balance the lagging nature of
> DefCore as an interoperability guideline with the needs of testing
> upstream changes. I’m not asking for a permanent change that
> undermines the goals of Tempest for QA, rather a temporary
> upstream modification that recognizes the challenges faced by
> vendors in the market right now, and gives them room to continue
> to align themselves with upstream. Without this, the two other 
> alternatives are to:
> 
> * Have some vendors leave the Powered program unnecessarily,
>   weakening it.
> * Force DefCore to adopt non-upstream testing, either as a fork
>   or an independent test suite.
> 
> Neither seem ideal to me.

It might not be ideal for a vendor to leave the program, but I think it's a
necessary consequence of evolving the guidlines to become stricter over time.
What we define as the minimum requirements for interoperability and by extension
use of the trademark will continue to evolve. Every time we add additional tests,
more stringent checking, or change something inevitably someone is going to fail
no matter how slowly we ramp it out.

There's a limit to how accommodating we should be here. This change has been in
the wild for a year, and also took 6 months to land. The issue in question
literally only ever will cause and issue if you add something extra, not
OpenStack, to the API. All versions of the nova API (maybe not really old
releases like <= folsom) should get passed this check without any issue. I still
fail to see how a vendor failing the guidelines here is a bad thing. Isn't this
what we're supposed to be doing.

Also, defcore already has a mechanism for slowly rolling out changes like this. The
guidelines contain a tempest sha1 (for better or worse):

https://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json#n113

If the defcore committee still feels there needs to be a more gradual roll out of
 > 1yr (which I strongly disagree with) then the minimum sha1 should be set more
conservatively to a point before the change in question. Yes that means old bugs
will still be present in tempest, but I don't think we can have it both ways here.
Either we say you have to pass stricter requirements or we don't. We added
idempotent ids to tempest exactly for this reason so you can keep track of tests
as things change.

> 
> One of my goals is to transparently strengthen the ties between
> upstream and downstream development. There is a deadline
> built into this proposal, and my intention is to enforce it.

My argument is that the deadline has already passed. We've been enforcing this
in tempest for 1 year already. It's only coming up now because some vendors didn't
pay attention to anything happening in the community or with changes in the
testing guidelines were incoming and now are stuck. From my perspective this
will always happen no matter how gradually we make changes and how much we
advertise it.

> 
> > I don't think a cloud or product that does this
> > to the api should be considered an interoperable OpenStack cloud and failing the
> > tests is the correct behavior.
> 
> I think it’s more nuanced than this, especially right now.
> Only additions to responses will be considered, not changes.
> These additions will be clearly labelled as variations,
> signaling the differences to users. Existing clients in use
> will not break. Correct behavior will eventually be enforced,
> and this would be clearly signaled by both the test tool and
> through the administrative program.

You're making large assumptions about how the APIs are actually consumed here.
You can't assume that only one of the clients you know about is being used to
talk to APIs. For example, I have a bunch of code I wrote a while ago that uses
the tempest clients with [1] to interact with clouds. That code would fail the
second I talked to a cloud with these extra bits enabled. Granted that's a bit
of a contrived example, but if I'm dealing with the api at a lower level (using
my hypothetical hand built fortran client) it's perfectly reasonable to assume
that I start on vendor A's "openstack" cloud see the extra params in the
response and assume they're everywhere and make my code depend on that. Then
when I use a cloud deployed on the 3 spare machines in my basement from the
latest release tarballs everything starts failing without any indication where
that extra parameter went. That's the kind of experience we're trying to avoid.

Also, there is also no guarantee that the extra fields are clearly marked. If we
disable this checking literally anything can be added to the responses from nova
and still pass for example if we're not explicitly checking for it. For example,
I could add a top level field to the server response "useful: True" for things
that use my proprietary hypervisor and "useful: False" for libvirt guests. There
is nothing stopping me from writing an extension that does that and adding it to
the API and then passing all the tests. Nothing would catch this if we disable
the strict validation.

My fundamental concern here is that we're optimizing for the wrong set of
priorities. As a community do we want to prioritize enforcing interoperability
with guidelines we define and develop in the open and that things that we
say are openstack behave in a manner for a user as we've developed in the
community. Or do we want to optimize for ensuring that vendors who are
continually slow to adapt don't ever fail guidelines when they've passed things
in the past. I'm all for doing a slow roll out of changes, to give people a
chance to adopt as new constraints are added, doing otherwise would be reckless.
But, I feel in this case the time for that has past. I also don't think we should
add workarounds to avoid adding constraints as things move forward, we should set
reasonable min version of tempest to use.

-Matt Treinish

[1] https://github.com/mtreinish/mesocyclone


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160614/d8c71bc5/attachment.pgp>


More information about the OpenStack-dev mailing list