[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing
Doug Hellmann
doug at doughellmann.com
Wed Jun 15 13:10:30 UTC 2016
Excerpts from Chris Hoge's message of 2016-06-14 16:37:06 -0700:
> Top posting one note and direct comments inline, I’m proposing
> this as a member of the DefCore working group, but this
> proposal itself has not been accepted as the forward course of
> action by the working group. These are my own views as the
> administrator of the program and not that of the working group
> itself, which may independently reject the idea outside of the
> response from the upstream devs.
>
> I posted a link to this thread to the DefCore mailing list to make
> that working group aware of the outstanding issues.
>
> > On Jun 14, 2016, at 3:50 PM, Matthew Treinish <mtreinish at kortar.org> wrote:
> >
> > On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote:
> >> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400:
> >>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
> >>>> Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
> >>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> >>>>>> Last year, in response to Nova micro-versioning and extension updates[1],
> >>>>>> the QA team added strict API schema checking to Tempest to ensure that
> >>>>>> no additional properties were added to Nova API responses[2][3]. In the
> >>>>>> last year, at least three vendors participating the the OpenStack Powered
> >>>>>> Trademark program have been impacted by this change, two of which
> >>>>>> reported this to the DefCore Working Group mailing list earlier this year[4].
> >>>>>>
> >>>>>> The DefCore Working Group determines guidelines for the OpenStack Powered
> >>>>>> program, which includes capabilities with associated functional tests
> >>>>>> from Tempest that must be passed, and designated sections with associated
> >>>>>> upstream code [5][6]. In determining these guidelines, the working group
> >>>>>> attempts to balance the future direction of development with lagging
> >>>>>> indicators of deployments and user adoption.
> >>>>>>
> >>>>>> After a tremendous amount of consideration, I believe that the DefCore
> >>>>>> Working Group needs to implement a temporary waiver for the strict API
> >>>>>> checking requirements that were introduced last year, to give downstream
> >>>>>> deployers more time to catch up with the strict micro-versioning
> >>>>>> requirements determined by the Nova/Compute team and enforced by the
> >>>>>> Tempest/QA team.
> >>>>>
> >>>>> I'm very much opposed to this being done. If we're actually concerned with
> >>>>> interoperability and verify that things behave in the same manner between multiple
> >>>>> clouds then doing this would be a big step backwards. The fundamental disconnect
> >>>>> here is that the vendors who have implemented out of band extensions or were
> >>>>> taking advantage of previously available places to inject extra attributes
> >>>>> believe that doing so means they're interoperable, which is quite far from
> >>>>> reality. **The API is not a place for vendor differentiation.**
> >>>>
> >>>> This is a temporary measure to address the fact that a large number
> >>>> of existing tests changed their behavior, rather than having new
> >>>> tests added to enforce this new requirement. The result is deployments
> >>>> that previously passed these tests may no longer pass, and in fact
> >>>> we have several cases where that's true with deployers who are
> >>>> trying to maintain their own standard of backwards-compatibility
> >>>> for their end users.
> >>>
> >>> That's not what happened though. The API hasn't changed and the tests haven't
> >>> really changed either. We made our enforcement on Nova's APIs a bit stricter to
> >>> ensure nothing unexpected appeared. For the most these tests work on any version
> >>> of OpenStack. (we only test it in the gate on supported stable releases, but I
> >>> don't expect things to have drastically shifted on older releases) It also
> >>> doesn't matter which version of the API you run, v2.0 or v2.1. Literally, the
> >>> only case it ever fails is when you run something extra, not from the community,
> >>> either as an extension (which themselves are going away [1]) or another service
> >>> that wraps nova or imitates nova. I'm personally not comfortable saying those
> >>> extras are ever part of the OpenStack APIs.
> >>>
> >>>> We have basically three options.
> >>>>
> >>>> 1. Tell deployers who are trying to do the right for their immediate
> >>>> users that they can't use the trademark.
> >>>>
> >>>> 2. Flag the related tests or remove them from the DefCore enforcement
> >>>> suite entirely.
> >>>>
> >>>> 3. Be flexible about giving consumers of Tempest time to meet the
> >>>> new requirement by providing a way to disable the checks.
> >>>>
> >>>> Option 1 goes against our own backwards compatibility policies.
> >>>
> >>> I don't think backwards compatibility policies really apply to what what define
> >>> as the set of tests that as a community we are saying a vendor has to pass to
> >>> say they're OpenStack. From my perspective as a community we either take a hard
> >>> stance on this and say to be considered an interoperable cloud (and to get the
> >>> trademark) you have to actually have an interoperable product. We slowly ratchet
> >>> up the requirements every 6 months, there isn't any implied backwards
> >>> compatibility in doing that. You passed in the past but not in the newer stricter
> >>> guidelines.
> >>>
> >>> Also, even if I did think it applied, we're not talking about a change which
> >>> would fall into breaking that. The change was introduced a year and half ago
> >>> during kilo and landed a year ago during liberty:
> >>>
> >>> https://review.openstack.org/#/c/156130/
> >>>
> >>> That's way longer than our normal deprecation period of 3 months and a release
> >>> boundary.
> >>>
> >>>>
> >>>> Option 2 gives us no winners and actually reduces the interoperability
> >>>> guarantees we already have in place.
> >>>>
> >>>> Option 3 applies our usual community standard of slowly rolling
> >>>> forward while maintaining compatibility as broadly as possible.
> >>>
> >>> Except in this case there isn't actually any compatibility being maintained.
> >>> We're saying that we can't make the requirements for interoperability testing
> >>> stricter until all the vendors who were passing in the past are able to pass
> >>> the stricter version.
> >>>
> >>>>
> >>>> No one is suggesting that a permanent, or even open-ended, exception
> >>>> be granted.
> >>>
> >>> Sure, I agree an permanent or open-ended exception would be even worse. But, I
> >>> still think as a community we need to draw a hard line in the sand here. Just
> >>> because this measure is temporary doesn't make it any more palatable.
> >>>
> >>> By doing this, even as a temporary measure, we're saying it's ok to call things
> >>> an OpenStack API when you add random gorp to the responses. Which is something we've
> >>> very clearly said as a community is the exact opposite of the case, which the
> >>> testing reflects. I still contend just because some vendors were running old
> >>> versions of tempest and old versions of openstack where their incompatible API
> >>> changes weren't caught doesn't mean they should be given pass now.
> >>
> >> Nobody is saying random gorp is OK, and I'm not sure "line in the
> >> sand" rhetoric is really constructive. The issue is not with the
> >> nature of the API policies, it's with the implementation of those
> >> policies and how they were rolled out.
> >>
> >> DefCore defines its rules using named tests in Tempest. If these
> >> new enforcement policies had been applied by adding new tests to
> >> Tempest, then DefCore could have added them using its processes
> >> over a period of time and we wouldn't have had any issues. That's
> >> not what happened. Instead, the behavior of a bunch of *existing*
> >> tests changed. As a result, deployments that have not changed fail
> >> tests that they used to pass, without any action being taken on the
> >> deployer's part. We've moved the goal posts on our users in a way
> >> that was not easily discoverable, because it couldn't be tracked
> >> through the (admittedly limited) process we have in place for doing
> >> that tracking.
> >>
> >> So, we want a way to get the test results back to their existing
> >> status, which will then let us roll adoption forward smoothly instead
> >> of lurching from "pass" to "fail" to "pass".
> >
> > It doesn't have to be a bright line pass or fail. My primary concern here is
> > that making this change is basically saying we're going to let things "pass"
> > when running out of tree stuff that's adding arbitrary fields to the response. This
> > isn't really interoperable and isn't being honest with what the vendor clouds are
> > actually doing. It would hide the truth from the people who rely on these results
> > to determine interoperability. The proposal as I read it (and maybe it's my
> > misconception) was to mask this and vendor clouds "pass" until they can fix it,
> > which essentially hides the issue. Especially given there are a lot of clouds and
> > products that don't have any issue here.
>
> The opposite is the intention of this proposal. It’s a compromise that admits
> that since the introduction of the OpenStack Powered program, and the release
> of this strict checking on additional properties, vendors that once passed
> now fail, and the incentives to force that change didn’t start being felt until
> they hit their product renewal cycle.
>
> It’s not trying to mask anything, to the contrary by bringing it up here and
> stating their public test results would indicate which APIs send additional
> properties back, it’s shining a light on the issue and publicly stating that it’s
> not an acceptable long-term solution.
>
> > But, if we add another possible state on the defcore side like conditional pass,
> > warning, yellow, etc. (the name doesn't matter) which is used to indicate that
> > things on product X could only pass when strict validation was disabled (and
> > be clear about where and why) then my concerns would be alleviated. I just do
> > not want this to end up not being visible to end users trying to evaluate
> > interoperability of different clouds using the test results.
>
> The OpenStack Marketplace is where these comparisons would happen,
> and the APIs with additional response data would be stated.
>
> >>
> >> We should, separately, address the process issues and the limitations
> >> this situation has exposed. That may mean changing the way DefCore
> >> defines its policies, or tracks things, or uses Tempest. For
> >> example, in the future, we may want tie versions of Tempest to
> >> versions of the trademark more closely, so that it's possible for
> >> someone running the Mitaka version of OpenStack to continue to use
> >> the Mitaka version of Tempest and not have to upgrade Tempest in
> >> order to retain their trademark (maybe that's how it already works?).
> >
> > Tempest master supports all currently supported stable branches. So right now
> > any commit to master is tested against a master cloud, a mitaka cloud, and a
> > liberty cloud in the gate. We tag/push a release whenever we add or drop support
> > for a release, the most recent being dropping kilo. [1][2] That being said the
> > openstack apis **should** be backwards compatible so ideally master tempest would
> > work fine on older clouds. (although this might not be reality) The primary
> > wrinkle here are the tests which would depend on feature flags to indicate it's
> > availability on newer versions. We eventually remove flags after all supported
> > releases have a given feature. But, this can be worked around with test
> > selection. (ie don't even try to run tests that require a feature juno didn’t
> > have)
>
> The current active guidelines cover icehouse through mitaka. The release
> of 2016.08 will change that to cover juno through mitaka (with newton
> as an add-on to 2016.08 when it’s released). There’s overlap between
> the guidelines, so 2016.01 covers juno through mitaka while 2016.08
> will cover kilo through newton. Essentially two years of releases.
>
> >> We may also need to consider that test implementation details may
> >> change, and have a review process within DefCore to help expose
> >> those changes to make them clearer to deployers.
> >>
> >> Fixing the process issue may also mean changing the way we implement
> >> things in Tempest. In this case, adding a flag helps move ahead
> >> more smoothly. Perhaps we adopt that as a general policy in the
> >> future when we make underlying behavioral changes like this to
> >> existing tests. Perhaps instead we have a policy that we do not
> >> change the behavior of existing tests in such significant ways, at
> >> least if they're tagged as being used by DefCore. I don't know --
> >> those are things we need to discuss.
> >
> > Sure I agree, this thread raises larger issues which need to be figured out.
> > But, that is probably an independent discussion.
>
> I’m beginning to wonder if we need to make DefCore use release
> branches then back-port bug-fixes and relevant features additions
> as necessary.
We should definitely have that conversation, to understand what
effect it would have both on Tempest and on DefCore.
Doug
More information about the OpenStack-dev
mailing list