[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

Ken'ichi Ohmichi ken1ohmichi at gmail.com
Mon Jun 20 01:52:20 UTC 2016


2016-06-16 2:26 GMT-07:00 Morgan Fainberg <morgan.fainberg at gmail.com>:
> On Wed, Jun 15, 2016 at 11:54 PM, Ken'ichi Ohmichi <ken1ohmichi at gmail.com>
> wrote:
>>
>> This discussion was expected when we implemented the Tempest patch,
>> then I sent a mail to defcore comittee[1]
>> As the above ml, "A DefCore Guideline typically covers three OpenStack
>> releases".
>> That means the latest guideline needs to cover Mitaka, Liberty and Kilo,
>> right?
>>
>> In the Kilo development, we(nova team) have already considered
>> additional properties are not good for the interoperability.
>> And the stable_api.rst of [2] which is contained in Kilo says we need
>> to implement new features without extensions.
>> However, there are Kilo+ clouds which are extended with vendors' own
>> extensions, right?
>>
>> My concern of allowing additional properties on interoperability tests is
>> that
>>  - users can move from pure OpenStack clouds to non-pure OpenStack
>> clouds which implement vender specific properties
>>  - but users cannot move from non-pure OpenStack clouds if users
>> depend on the properties
>> even if these clouds are certificated on the same interoperability tests.
>>
>
> The end goal is 100% to get everyone consistent with no "extra" data being
> passed out of the APIs and certified on the same tests.

Yeah, I am appreciated that everyone agree with the non-extra data as
the final goal.

> However, right now we have an issue where vendors/operators are lagging on
> getting this cleaned up. Since this is the first round of certifications
> (among other things), the proposal is to support/manage this in a way that
> gives a bit more of a grace period while the deployers/operators finish
> moving away from custom properties (as i understand it the ones affected
> have communicated that they are working on meeting this goal; Chris, please
> correct me if I am wrong).
>
> Your concerns are spot on, and at the end of this "greylist" window ( at the
> " 2017.01" defcore guideline ), this grace period will expire and everyone
> will be expected to be compatible without the "Extra" data. Part of the
> process of doing these programs is working to refine the process (and
> sometimes make exceptions in the early stages) until the workflow is
> established and understood. It is not expected to continue nor extend the
> period beyond the firm end point Chris highlighted. I would not support this
> proposal if it was open ended.

The greylist seems a good idea, and I am not so strongly against the idea.
However, I have still some questions about this direction.

I am thinking most important API of Nova is "create a server" API for
the interoperability, because most users want to use servers on
OpenStack clouds.
However, I am guessing most venders which cannot be passed through
current strict Tempest are customizing this API.
So if this API on the greylist on most venders' tests, the
interoperability seems a little meaningless.
Is that expected now?

One more question is that how many venders cannot pass through current Tempest?
100%? or 20%?
If 5% venders cannot pass, I guess we can say "the certification is
failed" to the venders.
I'd like to know current situation for expecting our future so that we
will need to mark this "greylist" as deprecated soon and need to know
how progress at some steps/cycles of venders.

Thanks
Ken Omichi

---
>> ---
>> [1]:
>> http://lists.openstack.org/pipermail/defcore-committee/2015-June/000849.html
>> [2]: https://review.openstack.org/#/c/162912
>>
>> 2016-06-14 16:37 GMT-07:00 Chris Hoge <chris at openstack.org>:
>> > Top posting one note and direct comments inline, I’m proposing
>> > this as a member of the DefCore working group, but this
>> > proposal itself has not been accepted as the forward course of
>> > action by the working group. These are my own views as the
>> > administrator of the program and not that of the working group
>> > itself, which may independently reject the idea outside of the
>> > response from the upstream devs.
>> >
>> > I posted a link to this thread to the DefCore mailing list to make
>> > that working group aware of the outstanding issues.
>> >
>> > On Jun 14, 2016, at 3:50 PM, Matthew Treinish <mtreinish at kortar.org>
>> > wrote:
>> >
>> > On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote:
>> >
>> > Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400:
>> >
>> > On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
>> >
>> > Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
>> >
>> > On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
>> >
>> > Last year, in response to Nova micro-versioning and extension
>> > updates[1],
>> > the QA team added strict API schema checking to Tempest to ensure that
>> > no additional properties were added to Nova API responses[2][3]. In the
>> > last year, at least three vendors participating the the OpenStack
>> > Powered
>> > Trademark program have been impacted by this change, two of which
>> > reported this to the DefCore Working Group mailing list earlier this
>> > year[4].
>> >
>> > The DefCore Working Group determines guidelines for the OpenStack
>> > Powered
>> > program, which includes capabilities with associated functional tests
>> > from Tempest that must be passed, and designated sections with
>> > associated
>> > upstream code [5][6]. In determining these guidelines, the working group
>> > attempts to balance the future direction of development with lagging
>> > indicators of deployments and user adoption.
>> >
>> > After a tremendous amount of consideration, I believe that the DefCore
>> > Working Group needs to implement a temporary waiver for the strict API
>> > checking requirements that were introduced last year, to give downstream
>> > deployers more time to catch up with the strict micro-versioning
>> > requirements determined by the Nova/Compute team and enforced by the
>> > Tempest/QA team.
>> >
>> >
>> > I'm very much opposed to this being done. If we're actually concerned
>> > with
>> > interoperability and verify that things behave in the same manner
>> > between
>> > multiple
>> > clouds then doing this would be a big step backwards. The fundamental
>> > disconnect
>> > here is that the vendors who have implemented out of band extensions or
>> > were
>> > taking advantage of previously available places to inject extra
>> > attributes
>> > believe that doing so means they're interoperable, which is quite far
>> > from
>> > reality. **The API is not a place for vendor differentiation.**
>> >
>> >
>> > This is a temporary measure to address the fact that a large number
>> > of existing tests changed their behavior, rather than having new
>> > tests added to enforce this new requirement. The result is deployments
>> > that previously passed these tests may no longer pass, and in fact
>> > we have several cases where that's true with deployers who are
>> > trying to maintain their own standard of backwards-compatibility
>> > for their end users.
>> >
>> >
>> > That's not what happened though. The API hasn't changed and the tests
>> > haven't
>> > really changed either. We made our enforcement on Nova's APIs a bit
>> > stricter
>> > to
>> > ensure nothing unexpected appeared. For the most these tests work on any
>> > version
>> > of OpenStack. (we only test it in the gate on supported stable releases,
>> > but
>> > I
>> > don't expect things to have drastically shifted on older releases) It
>> > also
>> > doesn't matter which version of the API you run, v2.0 or v2.1.
>> > Literally,
>> > the
>> > only case it ever fails is when you run something extra, not from the
>> > community,
>> > either as an extension (which themselves are going away [1]) or another
>> > service
>> > that wraps nova or imitates nova. I'm personally not comfortable saying
>> > those
>> > extras are ever part of the OpenStack APIs.
>> >
>> > We have basically three options.
>> >
>> > 1. Tell deployers who are trying to do the right for their immediate
>> >   users that they can't use the trademark.
>> >
>> > 2. Flag the related tests or remove them from the DefCore enforcement
>> >   suite entirely.
>> >
>> > 3. Be flexible about giving consumers of Tempest time to meet the
>> >   new requirement by providing a way to disable the checks.
>> >
>> > Option 1 goes against our own backwards compatibility policies.
>> >
>> >
>> > I don't think backwards compatibility policies really apply to what what
>> > define
>> > as the set of tests that as a community we are saying a vendor has to
>> > pass
>> > to
>> > say they're OpenStack. From my perspective as a community we either take
>> > a
>> > hard
>> > stance on this and say to be considered an interoperable cloud (and to
>> > get
>> > the
>> > trademark) you have to actually have an interoperable product. We slowly
>> > ratchet
>> > up the requirements every 6 months, there isn't any implied backwards
>> > compatibility in doing that. You passed in the past but not in the newer
>> > stricter
>> > guidelines.
>> >
>> > Also, even if I did think it applied, we're not talking about a change
>> > which
>> > would fall into breaking that. The change was introduced a year and half
>> > ago
>> > during kilo and landed a year ago during liberty:
>> >
>> > https://review.openstack.org/#/c/156130/
>> >
>> > That's way longer than our normal deprecation period of 3 months and a
>> > release
>> > boundary.
>> >
>> >
>> > Option 2 gives us no winners and actually reduces the interoperability
>> > guarantees we already have in place.
>> >
>> > Option 3 applies our usual community standard of slowly rolling
>> > forward while maintaining compatibility as broadly as possible.
>> >
>> >
>> > Except in this case there isn't actually any compatibility being
>> > maintained.
>> > We're saying that we can't make the requirements for interoperability
>> > testing
>> > stricter until all the vendors who were passing in the past are able to
>> > pass
>> > the stricter version.
>> >
>> >
>> > No one is suggesting that a permanent, or even open-ended, exception
>> > be granted.
>> >
>> >
>> > Sure, I agree an permanent or open-ended exception would be even worse.
>> > But,
>> > I
>> > still think as a community we need to draw a hard line in the sand here.
>> > Just
>> > because this measure is temporary doesn't make it any more palatable.
>> >
>> > By doing this, even as a temporary measure, we're saying it's ok to call
>> > things
>> > an OpenStack API when you add random gorp to the responses. Which is
>> > something we've
>> > very clearly said as a community is the exact opposite of the case,
>> > which
>> > the
>> > testing reflects. I still contend just because some vendors were running
>> > old
>> > versions of tempest and old versions of openstack where their
>> > incompatible
>> > API
>> > changes weren't caught doesn't mean they should be given pass now.
>> >
>> >
>> > Nobody is saying random gorp is OK, and I'm not sure "line in the
>> > sand" rhetoric is really constructive. The issue is not with the
>> > nature of the API policies, it's with the implementation of those
>> > policies and how they were rolled out.
>> >
>> > DefCore defines its rules using named tests in Tempest.  If these
>> > new enforcement policies had been applied by adding new tests to
>> > Tempest, then DefCore could have added them using its processes
>> > over a period of time and we wouldn't have had any issues. That's
>> > not what happened. Instead, the behavior of a bunch of *existing*
>> > tests changed. As a result, deployments that have not changed fail
>> > tests that they used to pass, without any action being taken on the
>> > deployer's part. We've moved the goal posts on our users in a way
>> > that was not easily discoverable, because it couldn't be tracked
>> > through the (admittedly limited) process we have in place for doing
>> > that tracking.
>> >
>> > So, we want a way to get the test results back to their existing
>> > status, which will then let us roll adoption forward smoothly instead
>> > of lurching from "pass" to "fail" to "pass".
>> >
>> >
>> > It doesn't have to be a bright line pass or fail. My primary concern
>> > here is
>> > that making this change is basically saying we're going to let things
>> > "pass"
>> > when running out of tree stuff that's adding arbitrary fields to the
>> > response. This
>> > isn't really interoperable and isn't being honest with what the vendor
>> > clouds are
>> > actually doing. It would hide the truth from the people who rely on
>> > these
>> > results
>> > to determine interoperability. The proposal as I read it (and maybe it's
>> > my
>> > misconception) was to mask this and vendor clouds "pass" until they can
>> > fix
>> > it,
>> > which essentially hides the issue. Especially given there are a lot of
>> > clouds and
>> > products that don't have any issue here.
>> >
>> >
>> > The opposite is the intention of this proposal. It’s a compromise that
>> > admits
>> > that since the introduction of the OpenStack Powered program, and the
>> > release
>> > of this strict checking on additional properties, vendors that once
>> > passed
>> > now fail, and the incentives to force that change didn’t start being
>> > felt
>> > until
>> > they hit their product renewal cycle.
>> >
>> > It’s not trying to mask anything, to the contrary by bringing it up here
>> > and
>> > stating their public test results would indicate which APIs send
>> > additional
>> > properties back, it’s shining a light on the issue and publicly stating
>> > that
>> > it’s
>> > not an acceptable long-term solution.
>> >
>> > But, if we add another possible state on the defcore side like
>> > conditional
>> > pass,
>> > warning, yellow, etc. (the name doesn't matter) which is used to
>> > indicate
>> > that
>> > things on product X could only pass when strict validation was disabled
>> > (and
>> > be clear about where and why) then my concerns would be alleviated. I
>> > just
>> > do
>> > not want this to end up not being visible to end users trying to
>> > evaluate
>> > interoperability of different clouds using the test results.
>> >
>> >
>> > The OpenStack Marketplace is where these comparisons would happen,
>> > and the APIs with additional response data would be stated.
>> >
>> >
>> > We should, separately, address the process issues and the limitations
>> > this situation has exposed.  That may mean changing the way DefCore
>> > defines its policies, or tracks things, or uses Tempest.  For
>> > example, in the future, we may want tie versions of Tempest to
>> > versions of the trademark more closely, so that it's possible for
>> > someone running the Mitaka version of OpenStack to continue to use
>> > the Mitaka version of Tempest and not have to upgrade Tempest in
>> > order to retain their trademark (maybe that's how it already works?).
>> >
>> >
>> > Tempest master supports all currently supported stable branches. So
>> > right
>> > now
>> > any commit to master is tested against a master cloud, a mitaka cloud,
>> > and a
>> > liberty cloud in the gate. We tag/push a release whenever we add or drop
>> > support
>> > for a release, the most recent being dropping kilo. [1][2] That being
>> > said
>> > the
>> > openstack apis **should** be backwards compatible so ideally master
>> > tempest
>> > would
>> > work fine on older clouds. (although this might not be reality) The
>> > primary
>> > wrinkle here are the tests which would depend on feature flags to
>> > indicate
>> > it's
>> > availability on newer versions. We eventually remove flags after all
>> > supported
>> > releases have a given feature. But, this can be worked around with test
>> > selection. (ie don't even try to run tests that require a feature juno
>> > didn’t
>> >
>> > have)
>> >
>> >
>> > The current active guidelines cover icehouse through mitaka. The release
>> > of 2016.08 will change that to cover juno through mitaka (with newton
>> > as an add-on to 2016.08 when it’s released). There’s overlap between
>> > the guidelines, so 2016.01 covers juno through mitaka while 2016.08
>> > will cover kilo through newton. Essentially two years of releases.
>> >
>> >
>> > We may also need to consider that test implementation details may
>> > change, and have a review process within DefCore to help expose
>> > those changes to make them clearer to deployers.
>> >
>> > Fixing the process issue may also mean changing the way we implement
>> > things in Tempest. In this case, adding a flag helps move ahead
>> > more smoothly. Perhaps we adopt that as a general policy in the
>> > future when we make underlying behavioral changes like this to
>> > existing tests.  Perhaps instead we have a policy that we do not
>> > change the behavior of existing tests in such significant ways, at
>> > least if they're tagged as being used by DefCore. I don't know --
>> > those are things we need to discuss.
>> >
>> >
>> > Sure I agree, this thread raises larger issues which need to be figured
>> > out.
>> > But, that is probably an independent discussion.
>> >
>> >
>> > I’m beginning to wonder if we need to make DefCore use release
>> > branches then back-port bug-fixes and relevant features additions
>> > as necessary.
>> >
>> > -Matt Treinish
>> >
>> > [1] http://docs.openstack.org/releasenotes/tempest/v12.0.0.html
>> > [2] http://git.openstack.org/cgit/openstack/tempest/tag/?h=12.0.0
>> >
>> >
>> > Doug
>> >
>> >
>> > -Matt Treinish
>> >
>> > [1]
>> > http://lists.openstack.org/pipermail/openstack-dev/2016-June/097285.html
>> >
>> >
>> > Doug
>> >
>> >
>> > As a user of several clouds myself I can say that having random gorp in
>> > a
>> > response makes it much more difficult to use my code against multiple
>> > clouds. I
>> > have to determine which properties being returned are specific to that
>> > vendor's
>> > cloud and if I actually need to depend on them for anything it makes
>> > whatever
>> > code I'm writing incompatible for using against any other cloud. (unless
>> > I
>> > special case that block for each cloud) Sean Dague wrote a good post
>> > where a
>> > lot
>> > of this was covered a year ago when microversions was starting to pick
>> > up
>> > steam:
>> >
>> > https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2
>> >
>> > I'd recommend giving it a read, he explains the user first perspective
>> > more
>> > clearly there.
>> >
>> > I believe Tempest in this case is doing the right thing from an
>> > interoperability
>> > perspective and ensuring that the API is actually the API. Not an API
>> > with
>> > extra
>> > bits a vendor decided to add. I don't think a cloud or product that does
>> > this
>> > to the api should be considered an interoperable OpenStack cloud and
>> > failing
>> > the
>> > tests is the correct behavior.
>> >
>> > -Matt Treinish
>> >
>> >
>> > My reasoning behind this is that while the change that enabled strict
>> > checking was discussed publicly in the developer community and took
>> > some time to be implemented, it still landed quickly and broke several
>> > existing deployments overnight. As Tempest has moved forward with
>> > bug and UX fixes (some in part to support the interoperability testing
>> > efforts of the DefCore Working Group), using an older versions of
>> > Tempest
>> > where this strict checking is not enforced is no longer a viable
>> > solution
>> > for downstream deployers. The TC has passed a resolution to advise
>> > DefCore to use Tempest as the single source of capability testing[7],
>> > but this naturally introduces tension between the competing goals of
>> > maintaining upstream functional testing and also tracking lagging
>> > indicators.
>> >
>> > My proposal for addressing this problem approaches it at two levels:
>> >
>> > * For the short term, I will submit a blueprint and patch to tempest
>> > that
>> >  allows configuration of a grey-list of Nova APIs where strict response
>> >  checking on additional properties will be disabled. So, for example,
>> >  if the 'create  servers' API call returned extra properties on that
>> > call,
>> >  the strict checking on this line[8] would be disabled at runtime.
>> >  Use of this code path will emit a deprecation warning, and the
>> >  code will be scheduled for removal in 2017 directly after the release
>> >  of the 2017.01 guideline. Vendors would be required so submit the
>> >  grey-list of APIs with additional response data that would be
>> >  published to their marketplace entry.
>> >
>> > * Longer term, vendors will be expected to work with upstream to update
>> >  the API for returning additional data that is compatible with
>> >  API micro-versioning as defined by the Nova team, and the waiver would
>> >  no longer be allowed after the release of the 2017.01 guideline.
>> >
>> > For the next half-year, I feel that this approach strengthens
>> > interoperability
>> > by accurately capturing the current state of OpenStack deployments and
>> > client tools. Before this change, additional properties on responses
>> > weren't explicitly disallowed, and vendors and deployers took advantage
>> > of this in production. While this is behavior that the Nova and QA teams
>> > want to stop, it will take a bit more time to reach downstream. Also, as
>> > of right now, as far as I know the only client that does strict response
>> > checking for Nova responses is the Tempest client. Currently, additional
>> > properties in responses are ignored and do not break existing client
>> > functionality. There is currently little to no harm done to downstream
>> > users by temporarily allowing additional data to be returned in
>> > responses.
>> >
>> > Thanks,
>> >
>> > Chris Hoge
>> > Interop Engineer
>> > OpenStack Foundation
>> >
>> > [1]
>> >
>> > https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/api-microversions.html
>> > [2]
>> >
>> > http://lists.openstack.org/pipermail/openstack-dev/2015-February/057613.html
>> > [3] https://review.openstack.org/#/c/156130
>> > [4]
>> >
>> > http://lists.openstack.org/pipermail/defcore-committee/2016-January/000986.html
>> > [5] http://git.openstack.org/cgit/openstack/defcore/tree/2015.07.json
>> > [6] http://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json
>> > [7]
>> >
>> > http://git.openstack.org/cgit/openstack/governance/tree/resolutions/20160504-defcore-test-location.rst
>> > [8]
>> >
>> > http://git.openstack.org/cgit/openstack/tempest-lib/tree/tempest_lib/api_schema/response/compute/v2_1/servers.py#n39
>> >
>> >
>> >
>> > __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> >
>> > __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> >
>> >
>> >
>> > __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list