[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

Mark Voelker mvoelker at vmware.com
Thu Jun 16 20:33:36 UTC 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512



On Jun 16, 2016, at 2:25 PM, Matthew Treinish <mtreinish at kortar.org> wrote:

On Thu, Jun 16, 2016 at 02:15:47PM -0400, Doug Hellmann wrote:
Excerpts from Matthew Treinish's message of 2016-06-16 13:56:31 -0400:
On Thu, Jun 16, 2016 at 12:59:41PM -0400, Doug Hellmann wrote:
Excerpts from Matthew Treinish's message of 2016-06-15 19:27:13 -0400:
On Wed, Jun 15, 2016 at 09:10:30AM -0400, Doug Hellmann wrote:
Excerpts from Chris Hoge's message of 2016-06-14 16:37:06 -0700:
Top posting one note and direct comments inline, I’m proposing
this as a member of the DefCore working group, but this
proposal itself has not been accepted as the forward course of
action by the working group. These are my own views as the
administrator of the program and not that of the working group
itself, which may independently reject the idea outside of the
response from the upstream devs.

I posted a link to this thread to the DefCore mailing list to make
that working group aware of the outstanding issues.

On Jun 14, 2016, at 3:50 PM, Matthew Treinish <mtreinish at kortar.org> wrote:

On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote:
Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400:
On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
Last year, in response to Nova micro-versioning and extension updates[1],
the QA team added strict API schema checking to Tempest to ensure that
no additional properties were added to Nova API responses[2][3]. In the
last year, at least three vendors participating the the OpenStack Powered
Trademark program have been impacted by this change, two of which
reported this to the DefCore Working Group mailing list earlier this year[4].

The DefCore Working Group determines guidelines for the OpenStack Powered
program, which includes capabilities with associated functional tests
from Tempest that must be passed, and designated sections with associated
upstream code [5][6]. In determining these guidelines, the working group
attempts to balance the future direction of development with lagging
indicators of deployments and user adoption.

After a tremendous amount of consideration, I believe that the DefCore
Working Group needs to implement a temporary waiver for the strict API
checking requirements that were introduced last year, to give downstream
deployers more time to catch up with the strict micro-versioning
requirements determined by the Nova/Compute team and enforced by the
Tempest/QA team.

I'm very much opposed to this being done. If we're actually concerned with
interoperability and verify that things behave in the same manner between multiple
clouds then doing this would be a big step backwards. The fundamental disconnect
here is that the vendors who have implemented out of band extensions or were
taking advantage of previously available places to inject extra attributes
believe that doing so means they're interoperable, which is quite far from
reality. **The API is not a place for vendor differentiation.**

This is a temporary measure to address the fact that a large number
of existing tests changed their behavior, rather than having new
tests added to enforce this new requirement. The result is deployments
that previously passed these tests may no longer pass, and in fact
we have several cases where that's true with deployers who are
trying to maintain their own standard of backwards-compatibility
for their end users.

That's not what happened though. The API hasn't changed and the tests haven't
really changed either. We made our enforcement on Nova's APIs a bit stricter to
ensure nothing unexpected appeared. For the most these tests work on any version
of OpenStack. (we only test it in the gate on supported stable releases, but I
don't expect things to have drastically shifted on older releases) It also
doesn't matter which version of the API you run, v2.0 or v2.1. Literally, the
only case it ever fails is when you run something extra, not from the community,
either as an extension (which themselves are going away [1]) or another service
that wraps nova or imitates nova. I'm personally not comfortable saying those
extras are ever part of the OpenStack APIs.

We have basically three options.

1. Tell deployers who are trying to do the right for their immediate
 users that they can't use the trademark.

2. Flag the related tests or remove them from the DefCore enforcement
 suite entirely.

3. Be flexible about giving consumers of Tempest time to meet the
 new requirement by providing a way to disable the checks.

Option 1 goes against our own backwards compatibility policies.

I don't think backwards compatibility policies really apply to what what define
as the set of tests that as a community we are saying a vendor has to pass to
say they're OpenStack. From my perspective as a community we either take a hard
stance on this and say to be considered an interoperable cloud (and to get the
trademark) you have to actually have an interoperable product. We slowly ratchet
up the requirements every 6 months, there isn't any implied backwards
compatibility in doing that. You passed in the past but not in the newer stricter
guidelines.

Also, even if I did think it applied, we're not talking about a change which
would fall into breaking that. The change was introduced a year and half ago
during kilo and landed a year ago during liberty:

https://review.openstack.org/#/c/156130/

That's way longer than our normal deprecation period of 3 months and a release
boundary.


Option 2 gives us no winners and actually reduces the interoperability
guarantees we already have in place.

Option 3 applies our usual community standard of slowly rolling
forward while maintaining compatibility as broadly as possible.

Except in this case there isn't actually any compatibility being maintained.
We're saying that we can't make the requirements for interoperability testing
stricter until all the vendors who were passing in the past are able to pass
the stricter version.


No one is suggesting that a permanent, or even open-ended, exception
be granted.

Sure, I agree an permanent or open-ended exception would be even worse. But, I
still think as a community we need to draw a hard line in the sand here. Just
because this measure is temporary doesn't make it any more palatable.

By doing this, even as a temporary measure, we're saying it's ok to call things
an OpenStack API when you add random gorp to the responses. Which is something we've
very clearly said as a community is the exact opposite of the case, which the
testing reflects. I still contend just because some vendors were running old
versions of tempest and old versions of openstack where their incompatible API
changes weren't caught doesn't mean they should be given pass now.

Nobody is saying random gorp is OK, and I'm not sure "line in the
sand" rhetoric is really constructive. The issue is not with the
nature of the API policies, it's with the implementation of those
policies and how they were rolled out.

DefCore defines its rules using named tests in Tempest.  If these
new enforcement policies had been applied by adding new tests to
Tempest, then DefCore could have added them using its processes
over a period of time and we wouldn't have had any issues. That's
not what happened. Instead, the behavior of a bunch of *existing*
tests changed. As a result, deployments that have not changed fail
tests that they used to pass, without any action being taken on the
deployer's part. We've moved the goal posts on our users in a way
that was not easily discoverable, because it couldn't be tracked
through the (admittedly limited) process we have in place for doing
that tracking.

So, we want a way to get the test results back to their existing
status, which will then let us roll adoption forward smoothly instead
of lurching from "pass" to "fail" to "pass".

It doesn't have to be a bright line pass or fail. My primary concern here is
that making this change is basically saying we're going to let things "pass"
when running out of tree stuff that's adding arbitrary fields to the response. This
isn't really interoperable and isn't being honest with what the vendor clouds are
actually doing. It would hide the truth from the people who rely on these results
to determine interoperability. The proposal as I read it (and maybe it's my
misconception) was to mask this and vendor clouds "pass" until they can fix it,
which essentially hides the issue. Especially given there are a lot of clouds and
products that don't have any issue here.

The opposite is the intention of this proposal. It’s a compromise that admits
that since the introduction of the OpenStack Powered program, and the release
of this strict checking on additional properties, vendors that once passed
now fail, and the incentives to force that change didn’t start being felt until
they hit their product renewal cycle.

It’s not trying to mask anything, to the contrary by bringing it up here and
stating their public test results would indicate which APIs send additional
properties back, it’s shining a light on the issue and publicly stating that it’s
not an acceptable long-term solution.

But, if we add another possible state on the defcore side like conditional pass,
warning, yellow, etc. (the name doesn't matter) which is used to indicate that
things on product X could only pass when strict validation was disabled (and
be clear about where and why) then my concerns would be alleviated. I just do
not want this to end up not being visible to end users trying to evaluate
interoperability of different clouds using the test results.

The OpenStack Marketplace is where these comparisons would happen,
and the APIs with additional response data would be stated.


We should, separately, address the process issues and the limitations
this situation has exposed.  That may mean changing the way DefCore
defines its policies, or tracks things, or uses Tempest.  For
example, in the future, we may want tie versions of Tempest to
versions of the trademark more closely, so that it's possible for
someone running the Mitaka version of OpenStack to continue to use
the Mitaka version of Tempest and not have to upgrade Tempest in
order to retain their trademark (maybe that's how it already works?).

Tempest master supports all currently supported stable branches. So right now
any commit to master is tested against a master cloud, a mitaka cloud, and a
liberty cloud in the gate. We tag/push a release whenever we add or drop support
for a release, the most recent being dropping kilo. [1][2] That being said the
openstack apis **should** be backwards compatible so ideally master tempest would
work fine on older clouds. (although this might not be reality) The primary
wrinkle here are the tests which would depend on feature flags to indicate it's
availability on newer versions. We eventually remove flags after all supported
releases have a given feature. But, this can be worked around with test
selection. (ie don't even try to run tests that require a feature juno didn’t
have)

The current active guidelines cover icehouse through mitaka. The release
of 2016.08 will change that to cover juno through mitaka (with newton
as an add-on to 2016.08 when it’s released). There’s overlap between
the guidelines, so 2016.01 covers juno through mitaka while 2016.08
will cover kilo through newton. Essentially two years of releases.

We may also need to consider that test implementation details may
change, and have a review process within DefCore to help expose
those changes to make them clearer to deployers.

Fixing the process issue may also mean changing the way we implement
things in Tempest. In this case, adding a flag helps move ahead
more smoothly. Perhaps we adopt that as a general policy in the
future when we make underlying behavioral changes like this to
existing tests.  Perhaps instead we have a policy that we do not
change the behavior of existing tests in such significant ways, at
least if they're tagged as being used by DefCore. I don't know --
those are things we need to discuss.

Sure I agree, this thread raises larger issues which need to be figured out.
But, that is probably an independent discussion.

I’m beginning to wonder if we need to make DefCore use release
branches then back-port bug-fixes and relevant features additions
as necessary.

We should definitely have that conversation, to understand what
effect it would have both on Tempest and on DefCore.


While from a quick glance this would seem like it would solve some of the
problems when you start to dig into it you'll see that it actually wouldn't,
and would just end up causing more issues in the long run. Branchless tempest
was originally started back at the icehouse release and was implemented to
actually enforce the API is the same across release boundaries. We had hit many

The guarantees we're trying to make in our CI system and the needs
DefCore has are slightly different in this regard. It sounds like
they're still needing to test against versions that we're no longer
supporting, while also avoiding changing the rules on those older
clouds.

Right, the crux of the problem here is defcore is trying to support something we
stopped supporting in the community. However, the actual thing being checked in
both use cases is actually the same; the API is the same regardless of the
cloud run against. (which includes different versions as well as different
deployment choices) It's just a conflict between our upstream support windows
and what defcore says they support.


I don't think it's appropriate to create stable/$series branches
in the Tempest repository, for all of the reasons you stated in
your email. It might be appropriate to create defcore/$version
branches, if we think we need to support backporting changes for
some reason. If not, simply creating defcore-$version tags would
give them a way to get a consistent version of Tempest that worked
with older versions of OpenStack.

This actually doesn't solve the problem, which is what my second paragraph
addressed (which got lost in the snip) and is where my issue with doing
branching or custom tagging lies. When we tag a release to mark a stable
branches EOL there isn't any infrastructure to run tests against that branch at
all anymore. It's gone, the stable branches of the projects are deleted, we
remove the devstack branch, the g-r branch, etc. all the workarounds we had to
put in place to keep things working over the stable support window go away.
That's something we're never going to ever maintain after a branches EOL. The
only point to doing a separate branch would be to support running against an EOL
branch, but you couldn't actually test that, you'd just be merging "backports"
blindly. That's *not* something we do in openstack. All the releases where we
have support master tempest as well as past tags support running against those
clouds.

There also isn't a reason to add additional tags, because we already have the
support milestones tagged. What defcore should be doing is specifying a version
range (well really just a min version) to match up with what they say is ok
to be running.

So if they want the LCD for kilo, liberty, and mitaka it would be:

http://git.openstack.org/cgit/openstack/tempest/tag/?h=12.0.0

for juno, kilo, and liberty it would be:

http://git.openstack.org/cgit/openstack/tempest/tag/?h=8

But, as I said in an earlier email the API shouldn't really be changing under
this model (and even if it did things would not diverge very quickly) So:

http://git.openstack.org/cgit/openstack/tempest/tag/?h=11.0.0

will likely works against juno, kilo, liberty, and mitaka.[1] The only thing that
would potentially be missing are feature flags in the tempest config to skip
tests for features that didn't exist in juno.[2] However, we just can't test it
against juno because that branch was EOL when 11.0.0 was pushed and the
infrastrucutre for running against juno was gone. 

The reverse also should be true, and old versions of tempest should work fine
against newer clouds, we just can't and don't test that. What we outline and I
try to make very clear in the release notes is that when we say supports a
version that means testing against it in the gate. If the API is truly a stable
interface then it should work against any cloud, aside from the new features
thing I mentioned before. (which by the way is why microversions are awesome,
because it solves that problem)

[1] It's also worth noting that the strict API validation which prompted this
thread was included in all of these releases. It was verified working on
kilo, juno, and **icehouse** before it could land:

https://review.openstack.org/#/c/156130/

[2] But, that wouldn't actually matter for the defcore use case because they
specify running a subset of tests that by definition can't include those.
(otherwise they wouldn't actually support juno)


There shouldn't ever be a need to run those older versions of Tempest
with newer clouds, and we should ensure there is a policy that
validation must happen using a version of Tempest no older than the
version of OpenStack to ensure that as we move ahead with new
capabilities, compatibility checks, etc. new deployments are validated
properly.

As someone running defcore on their product trying to get the certification this
is probably true. So they should be setting a min version for passing the
certification. Which they do:

https://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json#n111

It's just shown as the sha1 not tempest 4:

http://git.openstack.org/cgit/openstack/tempest/tag/?h=4

But for developing tempest (even for a hypothetical defcore branch of tempest)
it is not. You need to be able to use old clients with new versions of the
projects otherwise you've failed in your goal of maintaining API stability and
interoperability the code should be verified against all the versions you're
supporting. 


- -Matt Treinish

I think all of that is saying something like what I was proposing,
except that the tags they need already exist. Is that right?

Heh, yeah pretty much. I just tend to get overly verbose when I have to say
something a second time. (I think I made this point about the tags in an
earlier post)

- -Matt Treinish


I don't think DefCore actually needs to change old versions of Tempest,
but maybe Chris or Mark can verify that?

So if I’m groking this correctly, there’s kind of two scenarios being painted here.  One is the “LCD” approach where we use the $osversion-eol version of Tempest, where $osversion matches the oldest version covered in a Guideline.  The other is to use the start-of-$osversion version of Tempest where $osversion is the OpenStack version after the most recent one in the Guideline.  The former may result in some fairly long-lived flags, and the latter is actually not terribly different than what we do today I think.  Let me try to talk through both...

In some cases, tests get flagged in the Guidelines because of bugs in the test or because the test needs refactoring.  The underlying Capabilities the those tests are testing actually work fine.  Once we identify such an issue, the test can be fixed…in master.  Under the first scenario, this potentially creates some very long-lived flags:

2016.01 is the most current Guideline right now covers Juno, Kilo, Liberty (and Mitaka after it was released).  It’s one of the two Guidelines that you can use if you want an OpenStack Powered license from he Foundation, $vendor wants to run it against their shiny new Mitaka cloud.  They run the Juno EOL version of Tempest (tag=8), they find a test issue, and we flag it.  A few weeks later, a fix lands in Tempest.  Several months later the next Guideline rolls around: the oldest covered release is Kilo and we start telling people to use the Kilo-EOL version of Tempest.  That doesn’t have the fix, so the flag stays.  Another six months goes by and we get a Guideline and we’re up to the Liberty-EOL version of Tempest.  No fix, flag stays.  Six more months, and now we’re at Mitaka-EOL, and that's the first version that includes the fix.  

Generally speaking long lived flags aren’t so great because it means the tests are not required…which means there’s less or no assurance that the capabilities they test for actually work in the clouds that adhere to those Guidelines.  So, the worst-case scenario here looks kind of ugly.

As Matt correctly pointed out though, the capabilities DefCore selects for are generally pretty stable API’s that are long-lived across many releases, so we haven’t run into a lot of issues running pretty new versions of Tempest against older clouds to date.  In fact I’m struggling to think of a time we’ve flagged something because someone complained the test wasn’t runnable against an older release covered by the Guideline in question.  I can think of plenty of times where we’ve flagged something due to a test issue though…keep in mind we’re still in pretty formative times with DefCore here where these tests are starting to be used in a new way for the first time.  Anyway, as Matt points out we could potentially use a much newer Tempest tag: tag=11 (which is the start of Newton development and is a roughly 2 month old version of Tempest).  Next Guideline rolls around, we use the tag for start-of-ocata, and we get the fix and can drop the flag.

Today, RefStack client by default checks out a specific SHA of Tempest [1] (it actually did use a tag at some point in the past, and still can).  When we see a fix for a flagged test go in, we or the Refstack folks can do a quick test to make sure everything’s in order and then update that SHA to match the version with the fix.  That way we’re relatively sure we have a version that works today, and will work when we drop the flag in the next Guideline too.  When we finalize that next Guideline, we also update the test-repositories section of the new Guideline that Matt pointed to earlier to reflect the best-known version on the day the Guideline was sent to the Board for approval.  One added benefit of this approach is that people running the tests today may get a version of Tempest that includes a fix for a flagged test.  A flagged test isn’t required, but it does get run—and now will show a passing result, so we have data that says “this provider actually does support this capability (even though it’s flagged), and the test does indeed seem to be working."

So, that’s actually not hugely different from the second scenario I think?  Or did I miss something there? 

[1] http://git.openstack.org/cgit/openstack/refstack-client/tree/setup_env#n3

At Your Service,

Mark T. Voelker



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJXYw0fAAoJELUJLUWGN7CbdjUP/jcehx1r7eWRLOnyeQDX8uVe
98kHW9fcX02qKN1kvAtFnhZ6QbHEU933jHD73ngTXyJnolzylFEeZ8cYteBPhhmL
NVTdr5sbLQiKLI8znHZyCdbKuqISUUx5pjVy5r1IqnWOrI7u13rXPEUVQhxKJRSu
66srOY5xCr5wWRh3jhNVGC7QTV22oX+YyoXITsLvapDR+BL0/Xp5HL1ri3MOR1Z2
pU9HR0UHSxNrNYiUJRAHeOFiqE+nUu6RqTERRYXUDOONYMsun4o91+3d5fhbiVNq
SugAVdsjczeru0CLB2biTMOUNgvz2fIKzlSXOX1wH/Scmz6tA0nBmoiHJzpU6D/H
7zUZnTysrQHC1MwhSQrPeU4ihpknDd5KbKq8Pl7JEXn+FIxNOHHWDFxqLnksiQp0
RpG8W+Y3cRjgfGM+Q3vojGamcbR8YTFd4nqKC/MxcupivB8zzwb9T/9EDd2qcu+P
UXxcAxp4kPhhTaPl8e+7VtQiTadLVoSLZBXOHJp1LaTKc6pomymQF8zadZVS4BBP
mdvh/WbASi+W9Dy8pwm5wMhXDNP5iiV0EE29JDzWxWz4eFChP7wSi/I43L8WyxAl
oMCmu48iyH5wfaYmOBSmkjDwPlTHRkerKCI4byQWKlf9MRYersPC0MmD0wlSZvof
2Vu/fbT+9qZ+nGOeB6dc
=kn5q
-----END PGP SIGNATURE-----


More information about the OpenStack-dev mailing list