[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing
Doug Hellmann
doug at doughellmann.com
Thu Jun 16 18:15:47 UTC 2016
Excerpts from Matthew Treinish's message of 2016-06-16 13:56:31 -0400:
> On Thu, Jun 16, 2016 at 12:59:41PM -0400, Doug Hellmann wrote:
> > Excerpts from Matthew Treinish's message of 2016-06-15 19:27:13 -0400:
> > > On Wed, Jun 15, 2016 at 09:10:30AM -0400, Doug Hellmann wrote:
> > > > Excerpts from Chris Hoge's message of 2016-06-14 16:37:06 -0700:
> > > > > Top posting one note and direct comments inline, I’m proposing
> > > > > this as a member of the DefCore working group, but this
> > > > > proposal itself has not been accepted as the forward course of
> > > > > action by the working group. These are my own views as the
> > > > > administrator of the program and not that of the working group
> > > > > itself, which may independently reject the idea outside of the
> > > > > response from the upstream devs.
> > > > >
> > > > > I posted a link to this thread to the DefCore mailing list to make
> > > > > that working group aware of the outstanding issues.
> > > > >
> > > > > > On Jun 14, 2016, at 3:50 PM, Matthew Treinish <mtreinish at kortar.org> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote:
> > > > > >> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400:
> > > > > >>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
> > > > > >>>> Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
> > > > > >>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> > > > > >>>>>> Last year, in response to Nova micro-versioning and extension updates[1],
> > > > > >>>>>> the QA team added strict API schema checking to Tempest to ensure that
> > > > > >>>>>> no additional properties were added to Nova API responses[2][3]. In the
> > > > > >>>>>> last year, at least three vendors participating the the OpenStack Powered
> > > > > >>>>>> Trademark program have been impacted by this change, two of which
> > > > > >>>>>> reported this to the DefCore Working Group mailing list earlier this year[4].
> > > > > >>>>>>
> > > > > >>>>>> The DefCore Working Group determines guidelines for the OpenStack Powered
> > > > > >>>>>> program, which includes capabilities with associated functional tests
> > > > > >>>>>> from Tempest that must be passed, and designated sections with associated
> > > > > >>>>>> upstream code [5][6]. In determining these guidelines, the working group
> > > > > >>>>>> attempts to balance the future direction of development with lagging
> > > > > >>>>>> indicators of deployments and user adoption.
> > > > > >>>>>>
> > > > > >>>>>> After a tremendous amount of consideration, I believe that the DefCore
> > > > > >>>>>> Working Group needs to implement a temporary waiver for the strict API
> > > > > >>>>>> checking requirements that were introduced last year, to give downstream
> > > > > >>>>>> deployers more time to catch up with the strict micro-versioning
> > > > > >>>>>> requirements determined by the Nova/Compute team and enforced by the
> > > > > >>>>>> Tempest/QA team.
> > > > > >>>>>
> > > > > >>>>> I'm very much opposed to this being done. If we're actually concerned with
> > > > > >>>>> interoperability and verify that things behave in the same manner between multiple
> > > > > >>>>> clouds then doing this would be a big step backwards. The fundamental disconnect
> > > > > >>>>> here is that the vendors who have implemented out of band extensions or were
> > > > > >>>>> taking advantage of previously available places to inject extra attributes
> > > > > >>>>> believe that doing so means they're interoperable, which is quite far from
> > > > > >>>>> reality. **The API is not a place for vendor differentiation.**
> > > > > >>>>
> > > > > >>>> This is a temporary measure to address the fact that a large number
> > > > > >>>> of existing tests changed their behavior, rather than having new
> > > > > >>>> tests added to enforce this new requirement. The result is deployments
> > > > > >>>> that previously passed these tests may no longer pass, and in fact
> > > > > >>>> we have several cases where that's true with deployers who are
> > > > > >>>> trying to maintain their own standard of backwards-compatibility
> > > > > >>>> for their end users.
> > > > > >>>
> > > > > >>> That's not what happened though. The API hasn't changed and the tests haven't
> > > > > >>> really changed either. We made our enforcement on Nova's APIs a bit stricter to
> > > > > >>> ensure nothing unexpected appeared. For the most these tests work on any version
> > > > > >>> of OpenStack. (we only test it in the gate on supported stable releases, but I
> > > > > >>> don't expect things to have drastically shifted on older releases) It also
> > > > > >>> doesn't matter which version of the API you run, v2.0 or v2.1. Literally, the
> > > > > >>> only case it ever fails is when you run something extra, not from the community,
> > > > > >>> either as an extension (which themselves are going away [1]) or another service
> > > > > >>> that wraps nova or imitates nova. I'm personally not comfortable saying those
> > > > > >>> extras are ever part of the OpenStack APIs.
> > > > > >>>
> > > > > >>>> We have basically three options.
> > > > > >>>>
> > > > > >>>> 1. Tell deployers who are trying to do the right for their immediate
> > > > > >>>> users that they can't use the trademark.
> > > > > >>>>
> > > > > >>>> 2. Flag the related tests or remove them from the DefCore enforcement
> > > > > >>>> suite entirely.
> > > > > >>>>
> > > > > >>>> 3. Be flexible about giving consumers of Tempest time to meet the
> > > > > >>>> new requirement by providing a way to disable the checks.
> > > > > >>>>
> > > > > >>>> Option 1 goes against our own backwards compatibility policies.
> > > > > >>>
> > > > > >>> I don't think backwards compatibility policies really apply to what what define
> > > > > >>> as the set of tests that as a community we are saying a vendor has to pass to
> > > > > >>> say they're OpenStack. From my perspective as a community we either take a hard
> > > > > >>> stance on this and say to be considered an interoperable cloud (and to get the
> > > > > >>> trademark) you have to actually have an interoperable product. We slowly ratchet
> > > > > >>> up the requirements every 6 months, there isn't any implied backwards
> > > > > >>> compatibility in doing that. You passed in the past but not in the newer stricter
> > > > > >>> guidelines.
> > > > > >>>
> > > > > >>> Also, even if I did think it applied, we're not talking about a change which
> > > > > >>> would fall into breaking that. The change was introduced a year and half ago
> > > > > >>> during kilo and landed a year ago during liberty:
> > > > > >>>
> > > > > >>> https://review.openstack.org/#/c/156130/
> > > > > >>>
> > > > > >>> That's way longer than our normal deprecation period of 3 months and a release
> > > > > >>> boundary.
> > > > > >>>
> > > > > >>>>
> > > > > >>>> Option 2 gives us no winners and actually reduces the interoperability
> > > > > >>>> guarantees we already have in place.
> > > > > >>>>
> > > > > >>>> Option 3 applies our usual community standard of slowly rolling
> > > > > >>>> forward while maintaining compatibility as broadly as possible.
> > > > > >>>
> > > > > >>> Except in this case there isn't actually any compatibility being maintained.
> > > > > >>> We're saying that we can't make the requirements for interoperability testing
> > > > > >>> stricter until all the vendors who were passing in the past are able to pass
> > > > > >>> the stricter version.
> > > > > >>>
> > > > > >>>>
> > > > > >>>> No one is suggesting that a permanent, or even open-ended, exception
> > > > > >>>> be granted.
> > > > > >>>
> > > > > >>> Sure, I agree an permanent or open-ended exception would be even worse. But, I
> > > > > >>> still think as a community we need to draw a hard line in the sand here. Just
> > > > > >>> because this measure is temporary doesn't make it any more palatable.
> > > > > >>>
> > > > > >>> By doing this, even as a temporary measure, we're saying it's ok to call things
> > > > > >>> an OpenStack API when you add random gorp to the responses. Which is something we've
> > > > > >>> very clearly said as a community is the exact opposite of the case, which the
> > > > > >>> testing reflects. I still contend just because some vendors were running old
> > > > > >>> versions of tempest and old versions of openstack where their incompatible API
> > > > > >>> changes weren't caught doesn't mean they should be given pass now.
> > > > > >>
> > > > > >> Nobody is saying random gorp is OK, and I'm not sure "line in the
> > > > > >> sand" rhetoric is really constructive. The issue is not with the
> > > > > >> nature of the API policies, it's with the implementation of those
> > > > > >> policies and how they were rolled out.
> > > > > >>
> > > > > >> DefCore defines its rules using named tests in Tempest. If these
> > > > > >> new enforcement policies had been applied by adding new tests to
> > > > > >> Tempest, then DefCore could have added them using its processes
> > > > > >> over a period of time and we wouldn't have had any issues. That's
> > > > > >> not what happened. Instead, the behavior of a bunch of *existing*
> > > > > >> tests changed. As a result, deployments that have not changed fail
> > > > > >> tests that they used to pass, without any action being taken on the
> > > > > >> deployer's part. We've moved the goal posts on our users in a way
> > > > > >> that was not easily discoverable, because it couldn't be tracked
> > > > > >> through the (admittedly limited) process we have in place for doing
> > > > > >> that tracking.
> > > > > >>
> > > > > >> So, we want a way to get the test results back to their existing
> > > > > >> status, which will then let us roll adoption forward smoothly instead
> > > > > >> of lurching from "pass" to "fail" to "pass".
> > > > > >
> > > > > > It doesn't have to be a bright line pass or fail. My primary concern here is
> > > > > > that making this change is basically saying we're going to let things "pass"
> > > > > > when running out of tree stuff that's adding arbitrary fields to the response. This
> > > > > > isn't really interoperable and isn't being honest with what the vendor clouds are
> > > > > > actually doing. It would hide the truth from the people who rely on these results
> > > > > > to determine interoperability. The proposal as I read it (and maybe it's my
> > > > > > misconception) was to mask this and vendor clouds "pass" until they can fix it,
> > > > > > which essentially hides the issue. Especially given there are a lot of clouds and
> > > > > > products that don't have any issue here.
> > > > >
> > > > > The opposite is the intention of this proposal. It’s a compromise that admits
> > > > > that since the introduction of the OpenStack Powered program, and the release
> > > > > of this strict checking on additional properties, vendors that once passed
> > > > > now fail, and the incentives to force that change didn’t start being felt until
> > > > > they hit their product renewal cycle.
> > > > >
> > > > > It’s not trying to mask anything, to the contrary by bringing it up here and
> > > > > stating their public test results would indicate which APIs send additional
> > > > > properties back, it’s shining a light on the issue and publicly stating that it’s
> > > > > not an acceptable long-term solution.
> > > > >
> > > > > > But, if we add another possible state on the defcore side like conditional pass,
> > > > > > warning, yellow, etc. (the name doesn't matter) which is used to indicate that
> > > > > > things on product X could only pass when strict validation was disabled (and
> > > > > > be clear about where and why) then my concerns would be alleviated. I just do
> > > > > > not want this to end up not being visible to end users trying to evaluate
> > > > > > interoperability of different clouds using the test results.
> > > > >
> > > > > The OpenStack Marketplace is where these comparisons would happen,
> > > > > and the APIs with additional response data would be stated.
> > > > >
> > > > > >>
> > > > > >> We should, separately, address the process issues and the limitations
> > > > > >> this situation has exposed. That may mean changing the way DefCore
> > > > > >> defines its policies, or tracks things, or uses Tempest. For
> > > > > >> example, in the future, we may want tie versions of Tempest to
> > > > > >> versions of the trademark more closely, so that it's possible for
> > > > > >> someone running the Mitaka version of OpenStack to continue to use
> > > > > >> the Mitaka version of Tempest and not have to upgrade Tempest in
> > > > > >> order to retain their trademark (maybe that's how it already works?).
> > > > > >
> > > > > > Tempest master supports all currently supported stable branches. So right now
> > > > > > any commit to master is tested against a master cloud, a mitaka cloud, and a
> > > > > > liberty cloud in the gate. We tag/push a release whenever we add or drop support
> > > > > > for a release, the most recent being dropping kilo. [1][2] That being said the
> > > > > > openstack apis **should** be backwards compatible so ideally master tempest would
> > > > > > work fine on older clouds. (although this might not be reality) The primary
> > > > > > wrinkle here are the tests which would depend on feature flags to indicate it's
> > > > > > availability on newer versions. We eventually remove flags after all supported
> > > > > > releases have a given feature. But, this can be worked around with test
> > > > > > selection. (ie don't even try to run tests that require a feature juno didn’t
> > > > > > have)
> > > > >
> > > > > The current active guidelines cover icehouse through mitaka. The release
> > > > > of 2016.08 will change that to cover juno through mitaka (with newton
> > > > > as an add-on to 2016.08 when it’s released). There’s overlap between
> > > > > the guidelines, so 2016.01 covers juno through mitaka while 2016.08
> > > > > will cover kilo through newton. Essentially two years of releases.
> > > > >
> > > > > >> We may also need to consider that test implementation details may
> > > > > >> change, and have a review process within DefCore to help expose
> > > > > >> those changes to make them clearer to deployers.
> > > > > >>
> > > > > >> Fixing the process issue may also mean changing the way we implement
> > > > > >> things in Tempest. In this case, adding a flag helps move ahead
> > > > > >> more smoothly. Perhaps we adopt that as a general policy in the
> > > > > >> future when we make underlying behavioral changes like this to
> > > > > >> existing tests. Perhaps instead we have a policy that we do not
> > > > > >> change the behavior of existing tests in such significant ways, at
> > > > > >> least if they're tagged as being used by DefCore. I don't know --
> > > > > >> those are things we need to discuss.
> > > > > >
> > > > > > Sure I agree, this thread raises larger issues which need to be figured out.
> > > > > > But, that is probably an independent discussion.
> > > > >
> > > > > I’m beginning to wonder if we need to make DefCore use release
> > > > > branches then back-port bug-fixes and relevant features additions
> > > > > as necessary.
> > > >
> > > > We should definitely have that conversation, to understand what
> > > > effect it would have both on Tempest and on DefCore.
> > > >
> > >
> > > While from a quick glance this would seem like it would solve some of the
> > > problems when you start to dig into it you'll see that it actually wouldn't,
> > > and would just end up causing more issues in the long run. Branchless tempest
> > > was originally started back at the icehouse release and was implemented to
> > > actually enforce the API is the same across release boundaries. We had hit many
> >
> > The guarantees we're trying to make in our CI system and the needs
> > DefCore has are slightly different in this regard. It sounds like
> > they're still needing to test against versions that we're no longer
> > supporting, while also avoiding changing the rules on those older
> > clouds.
>
> Right, the crux of the problem here is defcore is trying to support something we
> stopped supporting in the community. However, the actual thing being checked in
> both use cases is actually the same; the API is the same regardless of the
> cloud run against. (which includes different versions as well as different
> deployment choices) It's just a conflict between our upstream support windows
> and what defcore says they support.
>
> >
> > I don't think it's appropriate to create stable/$series branches
> > in the Tempest repository, for all of the reasons you stated in
> > your email. It might be appropriate to create defcore/$version
> > branches, if we think we need to support backporting changes for
> > some reason. If not, simply creating defcore-$version tags would
> > give them a way to get a consistent version of Tempest that worked
> > with older versions of OpenStack.
>
> This actually doesn't solve the problem, which is what my second paragraph
> addressed (which got lost in the snip) and is where my issue with doing
> branching or custom tagging lies. When we tag a release to mark a stable
> branches EOL there isn't any infrastructure to run tests against that branch at
> all anymore. It's gone, the stable branches of the projects are deleted, we
> remove the devstack branch, the g-r branch, etc. all the workarounds we had to
> put in place to keep things working over the stable support window go away.
> That's something we're never going to ever maintain after a branches EOL. The
> only point to doing a separate branch would be to support running against an EOL
> branch, but you couldn't actually test that, you'd just be merging "backports"
> blindly. That's *not* something we do in openstack. All the releases where we
> have support master tempest as well as past tags support running against those
> clouds.
>
> There also isn't a reason to add additional tags, because we already have the
> support milestones tagged. What defcore should be doing is specifying a version
> range (well really just a min version) to match up with what they say is ok
> to be running.
>
> So if they want the LCD for kilo, liberty, and mitaka it would be:
>
> http://git.openstack.org/cgit/openstack/tempest/tag/?h=12.0.0
>
> for juno, kilo, and liberty it would be:
>
> http://git.openstack.org/cgit/openstack/tempest/tag/?h=8
>
> But, as I said in an earlier email the API shouldn't really be changing under
> this model (and even if it did things would not diverge very quickly) So:
>
> http://git.openstack.org/cgit/openstack/tempest/tag/?h=11.0.0
>
> will likely works against juno, kilo, liberty, and mitaka.[1] The only thing that
> would potentially be missing are feature flags in the tempest config to skip
> tests for features that didn't exist in juno.[2] However, we just can't test it
> against juno because that branch was EOL when 11.0.0 was pushed and the
> infrastrucutre for running against juno was gone.
>
> The reverse also should be true, and old versions of tempest should work fine
> against newer clouds, we just can't and don't test that. What we outline and I
> try to make very clear in the release notes is that when we say supports a
> version that means testing against it in the gate. If the API is truly a stable
> interface then it should work against any cloud, aside from the new features
> thing I mentioned before. (which by the way is why microversions are awesome,
> because it solves that problem)
>
> [1] It's also worth noting that the strict API validation which prompted this
> thread was included in all of these releases. It was verified working on
> kilo, juno, and **icehouse** before it could land:
>
> https://review.openstack.org/#/c/156130/
>
> [2] But, that wouldn't actually matter for the defcore use case because they
> specify running a subset of tests that by definition can't include those.
> (otherwise they wouldn't actually support juno)
>
> >
> > There shouldn't ever be a need to run those older versions of Tempest
> > with newer clouds, and we should ensure there is a policy that
> > validation must happen using a version of Tempest no older than the
> > version of OpenStack to ensure that as we move ahead with new
> > capabilities, compatibility checks, etc. new deployments are validated
> > properly.
>
> As someone running defcore on their product trying to get the certification this
> is probably true. So they should be setting a min version for passing the
> certification. Which they do:
>
> https://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json#n111
>
> It's just shown as the sha1 not tempest 4:
>
> http://git.openstack.org/cgit/openstack/tempest/tag/?h=4
>
> But for developing tempest (even for a hypothetical defcore branch of tempest)
> it is not. You need to be able to use old clients with new versions of the
> projects otherwise you've failed in your goal of maintaining API stability and
> interoperability the code should be verified against all the versions you're
> supporting.
>
>
> -Matt Treinish
I think all of that is saying something like what I was proposing,
except that the tags they need already exist. Is that right?
I don't think DefCore actually needs to change old versions of Tempest,
but maybe Chris or Mark can verify that?
Doug
More information about the OpenStack-dev
mailing list