[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing
Mark Voelker
mvoelker at vmware.com
Wed Jun 15 04:14:23 UTC 2016
On Jun 14, 2016, at 7:28 PM, Monty Taylor <mordred at inaugust.com> wrote:
>
> On 06/14/2016 05:42 PM, Doug Hellmann wrote:
>> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400:
>>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
>>>> Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
>>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
>>>>>> Last year, in response to Nova micro-versioning and extension updates[1],
>>>>>> the QA team added strict API schema checking to Tempest to ensure that
>>>>>> no additional properties were added to Nova API responses[2][3]. In the
>>>>>> last year, at least three vendors participating the the OpenStack Powered
>>>>>> Trademark program have been impacted by this change, two of which
>>>>>> reported this to the DefCore Working Group mailing list earlier this year[4].
>>>>>>
>>>>>> The DefCore Working Group determines guidelines for the OpenStack Powered
>>>>>> program, which includes capabilities with associated functional tests
>>>>>> from Tempest that must be passed, and designated sections with associated
>>>>>> upstream code [5][6]. In determining these guidelines, the working group
>>>>>> attempts to balance the future direction of development with lagging
>>>>>> indicators of deployments and user adoption.
>>>>>>
>>>>>> After a tremendous amount of consideration, I believe that the DefCore
>>>>>> Working Group needs to implement a temporary waiver for the strict API
>>>>>> checking requirements that were introduced last year, to give downstream
>>>>>> deployers more time to catch up with the strict micro-versioning
>>>>>> requirements determined by the Nova/Compute team and enforced by the
>>>>>> Tempest/QA team.
>>>>>
>>>>> I'm very much opposed to this being done. If we're actually concerned with
>>>>> interoperability and verify that things behave in the same manner between multiple
>>>>> clouds then doing this would be a big step backwards. The fundamental disconnect
>>>>> here is that the vendors who have implemented out of band extensions or were
>>>>> taking advantage of previously available places to inject extra attributes
>>>>> believe that doing so means they're interoperable, which is quite far from
>>>>> reality. **The API is not a place for vendor differentiation.**
>>>>
>>>> This is a temporary measure to address the fact that a large number
>>>> of existing tests changed their behavior, rather than having new
>>>> tests added to enforce this new requirement. The result is deployments
>>>> that previously passed these tests may no longer pass, and in fact
>>>> we have several cases where that's true with deployers who are
>>>> trying to maintain their own standard of backwards-compatibility
>>>> for their end users.
>>>
>>> That's not what happened though. The API hasn't changed and the tests haven't
>>> really changed either. We made our enforcement on Nova's APIs a bit stricter to
>>> ensure nothing unexpected appeared. For the most these tests work on any version
>>> of OpenStack. (we only test it in the gate on supported stable releases, but I
>>> don't expect things to have drastically shifted on older releases) It also
>>> doesn't matter which version of the API you run, v2.0 or v2.1. Literally, the
>>> only case it ever fails is when you run something extra, not from the community,
>>> either as an extension (which themselves are going away [1]) or another service
>>> that wraps nova or imitates nova. I'm personally not comfortable saying those
>>> extras are ever part of the OpenStack APIs.
>>>
>>>> We have basically three options.
>>>>
>>>> 1. Tell deployers who are trying to do the right for their immediate
>>>> users that they can't use the trademark.
>>>>
>>>> 2. Flag the related tests or remove them from the DefCore enforcement
>>>> suite entirely.
>>>>
>>>> 3. Be flexible about giving consumers of Tempest time to meet the
>>>> new requirement by providing a way to disable the checks.
>>>>
>>>> Option 1 goes against our own backwards compatibility policies.
>>>
>>> I don't think backwards compatibility policies really apply to what what define
>>> as the set of tests that as a community we are saying a vendor has to pass to
>>> say they're OpenStack. From my perspective as a community we either take a hard
>>> stance on this and say to be considered an interoperable cloud (and to get the
>>> trademark) you have to actually have an interoperable product. We slowly ratchet
>>> up the requirements every 6 months, there isn't any implied backwards
>>> compatibility in doing that. You passed in the past but not in the newer stricter
>>> guidelines.
>>>
>>> Also, even if I did think it applied, we're not talking about a change which
>>> would fall into breaking that. The change was introduced a year and half ago
>>> during kilo and landed a year ago during liberty:
>>>
>>> https://review.openstack.org/#/c/156130/
>>>
>>> That's way longer than our normal deprecation period of 3 months and a release
>>> boundary.
It is perhaps important to note here that the DefCore seems to have two meanings to a lot of people I talk to today: it’s a mark of interoperability (the OpenStack Powered badge that says certain capabilities of this cloud behave like other clouds bearing the mark) and it gives a cloud the ability to call itself OpenStack (e.g. you can get a trademark/logo license agreement from the Foundation).
The OpenStack Powered program currently covers Icehouse through Mitaka. Right now, that includes releases that were still on the Nova 2.0 API. API extensions were a supported thing [1] back in 2.0 and it was even explicitly documented that they allowed for additional attributes in the responses and “vendor specific niche functionality [1]”. The change to the Tempest tests [2] applied to the 2.0 API as well as 2.1 with the intent of preventing further changes from getting into the 2.0 API at the gate, which totally makes sense as a gate test. If those same tests are used for DefCore purposes, it does change what vendors need to do to be compliant with the Guidelines rather immediately--even on older releases of OpenStack using 2.0, which could be problematic (as noted elsewhere already [3]).
So, through the interoperability lens: I think many folks acknowledge that supporting extensions lead to a lot of variance between clouds, and that was Not So Awesome for interoperability. IIRC part of the rationale for switching to microversions with a single monotonic counter and deprecating extensions [4] was to set a course for eliminating a lot of that behavioral variance.
From the “ability to call yourself OpenStack” lens: it feels sort of wrong to tell a cloud that it can’t claim to be OpenStack because it’s running a version that falls within the bounds of the Powered program with the 2.0 API (when extensions weren't deprecated) and using the extension mechanism that 2.0 supported for years.
I think that’s part of what makes this issue tricky for a lot of folks.
[1] http://docs.openstack.org/developer/nova/v2/extensions.html
[2] https://review.openstack.org/#/c/156130
[3] http://lists.openstack.org/pipermail/defcore-committee/2015-June/000852.html
[4] http://developer.openstack.org/api-ref/compute/?expanded=list-extensions-detail,show-extension-details-detail#extensions-extensions-deprecated
>>>
>>>>
>>>> Option 2 gives us no winners and actually reduces the interoperability
>>>> guarantees we already have in place.
>>>>
>>>> Option 3 applies our usual community standard of slowly rolling
>>>> forward while maintaining compatibility as broadly as possible.
>>>
>>> Except in this case there isn't actually any compatibility being maintained.
>>> We're saying that we can't make the requirements for interoperability testing
>>> stricter until all the vendors who were passing in the past are able to pass
>>> the stricter version.
>>>
>>>>
>>>> No one is suggesting that a permanent, or even open-ended, exception
>>>> be granted.
>>>
>>> Sure, I agree an permanent or open-ended exception would be even worse. But, I
>>> still think as a community we need to draw a hard line in the sand here. Just
>>> because this measure is temporary doesn't make it any more palatable.
>>>
>>> By doing this, even as a temporary measure, we're saying it's ok to call things
>>> an OpenStack API when you add random gorp to the responses. Which is something we've
>>> very clearly said as a community is the exact opposite of the case, which the
>>> testing reflects. I still contend just because some vendors were running old
>>> versions of tempest and old versions of openstack where their incompatible API
>>> changes weren't caught doesn't mean they should be given pass now.
>>
>> Nobody is saying random gorp is OK, and I'm not sure "line in the
>> sand" rhetoric is really constructive. The issue is not with the
>> nature of the API policies, it's with the implementation of those
>> policies and how they were rolled out.
>>
>> DefCore defines its rules using named tests in Tempest. If these
>> new enforcement policies had been applied by adding new tests to
>> Tempest, then DefCore could have added them using its processes
>> over a period of time and we wouldn't have had any issues. That's
>> not what happened. Instead, the behavior of a bunch of *existing*
>> tests changed. As a result, deployments that have not changed fail
>> tests that they used to pass, without any action being taken on the
>> deployer's part. We've moved the goal posts on our users in a way
>> that was not easily discoverable, because it couldn't be tracked
>> through the (admittedly limited) process we have in place for doing
>> that tracking.
>>
>> So, we want a way to get the test results back to their existing
>> status, which will then let us roll adoption forward smoothly instead
>> of lurching from "pass" to "fail" to "pass".
>
> I think this is the most important thing to me as it relates to this.
> I'm obviously a huge proponent of clouds behaving more samely. But I
> also think that, as Doug nicely describes above, we've sort of backed in
> to removing something without a deprecation window ... largely because
> of the complexities involved with the system here - and I'd like to make
> sure that when we are being clear about behavior changes that we give
> the warning period so that people can adapt.
+1. The transition from an extensible API to microversions could involve pretty heavy changes for end users of those clouds that took advantage of extensibility, and might conceivably require a longish runway to implement to minimize the impact. As much as I want to see more interoperability among clouds, this transition isn’t trivial and I think providing some additional runway to help our ecosystem (both cloud owners and users of those clouds) through the transition is a reasonable compromise to consider.
Purely out of curiosity, was there any guidance written up about how cloud owners using extensions should make the transition? E.g. was there something that said “you need to introduce upstream microversion changes for everything you used to do in an extension”, or “you should split out vendor specific stuff that used to be extensions into a separate API endpoint/server that you run alongside Nova”, or…? I’d be interested in looking that over as part of thinking through this issue if there was. I’m curious partly because Ken’ichi brought the change up last year [5] (thanks again!) and it has surfaced on the DefCore ML/channel a couple of times since then [6][7] but discussion always seemed to die down pretty quickly. I (perhaps mistakenly?) assumed this meant that folks impacted were aware of the issue and were working toward a solution, and I suspect some of them may simply need more runway to work through the complexities.
[5] http://lists.openstack.org/pipermail/defcore-committee/2015-June/000849.html
[6] http://lists.openstack.org/pipermail/defcore-committee/2016-January/000986.html
[7] http://eavesdrop.openstack.org/irclogs/%23openstack-defcore/%23openstack-defcore.2015-06-24.log.html#t2015-06-24T23:16:06
>
>> We should, separately, address the process issues and the limitations
>> this situation has exposed. That may mean changing the way DefCore
>> defines its policies, or tracks things, or uses Tempest. For
>> example, in the future, we may want tie versions of Tempest to
>> versions of the trademark more closely, so that it's possible for
>> someone running the Mitaka version of OpenStack to continue to use
>> the Mitaka version of Tempest and not have to upgrade Tempest in
>> order to retain their trademark (maybe that's how it already works?).
It’s a bit of a hash today, frankly (pun maybe a little bit intended). DefCore doesn’t prescribe a specific version of Tempest that one must use when testing to get a license agreement. Refstack client does have a default version of Tempest (literally a git SHA) that it uses by default, and it does get updated from time to time to something that’s been tested by the Refstack folks. You can override it quite easily with a command line argument or by changing one line in your setup_env file.
In theory that means you could use whatever version of Tempest you want: in this specific scenario, you could theoretically use a version of Tempest from before the change disallowing additionalProperties went in. In reality though, that doesn’t always work because other stuff has happened. For example: some flags were removed on required tests in later Guidelines due to bugs in Tempest tests getting fixed (yay!), which happened after the additionalProperties change. If you use an older version of Tempest, you won’t get that fix and will fail those required tests. If you use a newer Tempest, you get the additionalProperties change and will fail if you’re using extensions to the Nova 2.0 API.
>> We may also need to consider that test implementation details may
>> change, and have a review process within DefCore to help expose
>> those changes to make them clearer to deployers.
>>
>> Fixing the process issue may also mean changing the way we implement
>> things in Tempest. In this case, adding a flag helps move ahead
>> more smoothly. Perhaps we adopt that as a general policy in the
>> future when we make underlying behavioral changes like this to
>> existing tests. Perhaps instead we have a policy that we do not
>> change the behavior of existing tests in such significant ways, at
>> least if they're tagged as being used by DefCore. I don't know --
>> those are things we need to discuss.
>
> ++
Agreed. I also just want to acknowledge that some of that discussion already seems to be happening and I think a lot of the involved parties are trying to figure out how to deal with the testing piece. We’ve seen that from recent TC resolutions [8][9], interactions with Tempest folks [10][11], and even discussions with the Board. So, thanks and let’s keep working on it. =)
[8] https://governance.openstack.org/resolutions/20160504-defcore-proxy-tests.html
[9] https://governance.openstack.org/resolutions/20160504-defcore-test-location.html
[10] https://etherpad.openstack.org/p/newton-qa-defcore-and-interoperability
[11] https://review.openstack.org/#/c/301879/
At Your Service,
Mark T. Voelker
>
>>>
>>> -Matt Treinish
>>>
>>> [1] http://lists.openstack.org/pipermail/openstack-dev/2016-June/097285.html
>>>>
>>>> Doug
>>>>
>>>>>
>>>>> As a user of several clouds myself I can say that having random gorp in a
>>>>> response makes it much more difficult to use my code against multiple clouds. I
>>>>> have to determine which properties being returned are specific to that vendor's
>>>>> cloud and if I actually need to depend on them for anything it makes whatever
>>>>> code I'm writing incompatible for using against any other cloud. (unless I
>>>>> special case that block for each cloud) Sean Dague wrote a good post where a lot
>>>>> of this was covered a year ago when microversions was starting to pick up steam:
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__dague.net_2015_06_05_the-2Dnova-2Dapi-2Din-2Dkilo-2Dand-2Dbeyond-2D2&d=CwICAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Q8IhPU-EIzbG5YDx5LYO7zEJpGZykn7RwFg-UTPWvDc&m=WGO-hyFj1oOYbhNYgZ1e1SMLaFkMuOefTcodVJb9kKY&s=r4b9H9El1oB3PWJc-c-mNOvA8VJFI3-HBxHFnhV_MhQ&e=
>>>>>
>>>>> I'd recommend giving it a read, he explains the user first perspective more
>>>>> clearly there.
>>>>>
>>>>> I believe Tempest in this case is doing the right thing from an interoperability
>>>>> perspective and ensuring that the API is actually the API. Not an API with extra
>>>>> bits a vendor decided to add. I don't think a cloud or product that does this
>>>>> to the api should be considered an interoperable OpenStack cloud and failing the
>>>>> tests is the correct behavior.
>>>>>
>>>>> -Matt Treinish
>>>>>
>>>>>>
>>>>>> My reasoning behind this is that while the change that enabled strict
>>>>>> checking was discussed publicly in the developer community and took
>>>>>> some time to be implemented, it still landed quickly and broke several
>>>>>> existing deployments overnight. As Tempest has moved forward with
>>>>>> bug and UX fixes (some in part to support the interoperability testing
>>>>>> efforts of the DefCore Working Group), using an older versions of Tempest
>>>>>> where this strict checking is not enforced is no longer a viable solution
>>>>>> for downstream deployers. The TC has passed a resolution to advise
>>>>>> DefCore to use Tempest as the single source of capability testing[7],
>>>>>> but this naturally introduces tension between the competing goals of
>>>>>> maintaining upstream functional testing and also tracking lagging
>>>>>> indicators.
>>>>>>
>>>>>> My proposal for addressing this problem approaches it at two levels:
>>>>>>
>>>>>> * For the short term, I will submit a blueprint and patch to tempest that
>>>>>> allows configuration of a grey-list of Nova APIs where strict response
>>>>>> checking on additional properties will be disabled. So, for example,
>>>>>> if the 'create servers' API call returned extra properties on that call,
>>>>>> the strict checking on this line[8] would be disabled at runtime.
>>>>>> Use of this code path will emit a deprecation warning, and the
>>>>>> code will be scheduled for removal in 2017 directly after the release
>>>>>> of the 2017.01 guideline. Vendors would be required so submit the
>>>>>> grey-list of APIs with additional response data that would be
>>>>>> published to their marketplace entry.
>>>>>>
>>>>>> * Longer term, vendors will be expected to work with upstream to update
>>>>>> the API for returning additional data that is compatible with
>>>>>> API micro-versioning as defined by the Nova team, and the waiver would
>>>>>> no longer be allowed after the release of the 2017.01 guideline.
>>>>>>
>>>>>> For the next half-year, I feel that this approach strengthens interoperability
>>>>>> by accurately capturing the current state of OpenStack deployments and
>>>>>> client tools. Before this change, additional properties on responses
>>>>>> weren't explicitly disallowed, and vendors and deployers took advantage
>>>>>> of this in production. While this is behavior that the Nova and QA teams
>>>>>> want to stop, it will take a bit more time to reach downstream. Also, as
>>>>>> of right now, as far as I know the only client that does strict response
>>>>>> checking for Nova responses is the Tempest client. Currently, additional
>>>>>> properties in responses are ignored and do not break existing client
>>>>>> functionality. There is currently little to no harm done to downstream
>>>>>> users by temporarily allowing additional data to be returned in responses.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Chris Hoge
>>>>>> Interop Engineer
>>>>>> OpenStack Foundation
>>>>>>
>>>>>> [1] https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/api-microversions.html
>>>>>> [2] http://lists.openstack.org/pipermail/openstack-dev/2015-February/057613.html
>>>>>> [3] https://review.openstack.org/#/c/156130
>>>>>> [4] http://lists.openstack.org/pipermail/defcore-committee/2016-January/000986.html
>>>>>> [5] http://git.openstack.org/cgit/openstack/defcore/tree/2015.07.json
>>>>>> [6] http://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json
>>>>>> [7] http://git.openstack.org/cgit/openstack/governance/tree/resolutions/20160504-defcore-test-location.rst
>>>>>> [8] http://git.openstack.org/cgit/openstack/tempest-lib/tree/tempest_lib/api_schema/response/compute/v2_1/servers.py#n39
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list