[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing
mordred at inaugust.com
Tue Jun 14 23:28:00 UTC 2016
On 06/14/2016 05:42 PM, Doug Hellmann wrote:
> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 -0400:
>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
>>> Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
>>>>> Last year, in response to Nova micro-versioning and extension updates,
>>>>> the QA team added strict API schema checking to Tempest to ensure that
>>>>> no additional properties were added to Nova API responses. In the
>>>>> last year, at least three vendors participating the the OpenStack Powered
>>>>> Trademark program have been impacted by this change, two of which
>>>>> reported this to the DefCore Working Group mailing list earlier this year.
>>>>> The DefCore Working Group determines guidelines for the OpenStack Powered
>>>>> program, which includes capabilities with associated functional tests
>>>>> from Tempest that must be passed, and designated sections with associated
>>>>> upstream code . In determining these guidelines, the working group
>>>>> attempts to balance the future direction of development with lagging
>>>>> indicators of deployments and user adoption.
>>>>> After a tremendous amount of consideration, I believe that the DefCore
>>>>> Working Group needs to implement a temporary waiver for the strict API
>>>>> checking requirements that were introduced last year, to give downstream
>>>>> deployers more time to catch up with the strict micro-versioning
>>>>> requirements determined by the Nova/Compute team and enforced by the
>>>>> Tempest/QA team.
>>>> I'm very much opposed to this being done. If we're actually concerned with
>>>> interoperability and verify that things behave in the same manner between multiple
>>>> clouds then doing this would be a big step backwards. The fundamental disconnect
>>>> here is that the vendors who have implemented out of band extensions or were
>>>> taking advantage of previously available places to inject extra attributes
>>>> believe that doing so means they're interoperable, which is quite far from
>>>> reality. **The API is not a place for vendor differentiation.**
>>> This is a temporary measure to address the fact that a large number
>>> of existing tests changed their behavior, rather than having new
>>> tests added to enforce this new requirement. The result is deployments
>>> that previously passed these tests may no longer pass, and in fact
>>> we have several cases where that's true with deployers who are
>>> trying to maintain their own standard of backwards-compatibility
>>> for their end users.
>> That's not what happened though. The API hasn't changed and the tests haven't
>> really changed either. We made our enforcement on Nova's APIs a bit stricter to
>> ensure nothing unexpected appeared. For the most these tests work on any version
>> of OpenStack. (we only test it in the gate on supported stable releases, but I
>> don't expect things to have drastically shifted on older releases) It also
>> doesn't matter which version of the API you run, v2.0 or v2.1. Literally, the
>> only case it ever fails is when you run something extra, not from the community,
>> either as an extension (which themselves are going away ) or another service
>> that wraps nova or imitates nova. I'm personally not comfortable saying those
>> extras are ever part of the OpenStack APIs.
>>> We have basically three options.
>>> 1. Tell deployers who are trying to do the right for their immediate
>>> users that they can't use the trademark.
>>> 2. Flag the related tests or remove them from the DefCore enforcement
>>> suite entirely.
>>> 3. Be flexible about giving consumers of Tempest time to meet the
>>> new requirement by providing a way to disable the checks.
>>> Option 1 goes against our own backwards compatibility policies.
>> I don't think backwards compatibility policies really apply to what what define
>> as the set of tests that as a community we are saying a vendor has to pass to
>> say they're OpenStack. From my perspective as a community we either take a hard
>> stance on this and say to be considered an interoperable cloud (and to get the
>> trademark) you have to actually have an interoperable product. We slowly ratchet
>> up the requirements every 6 months, there isn't any implied backwards
>> compatibility in doing that. You passed in the past but not in the newer stricter
>> Also, even if I did think it applied, we're not talking about a change which
>> would fall into breaking that. The change was introduced a year and half ago
>> during kilo and landed a year ago during liberty:
>> That's way longer than our normal deprecation period of 3 months and a release
>>> Option 2 gives us no winners and actually reduces the interoperability
>>> guarantees we already have in place.
>>> Option 3 applies our usual community standard of slowly rolling
>>> forward while maintaining compatibility as broadly as possible.
>> Except in this case there isn't actually any compatibility being maintained.
>> We're saying that we can't make the requirements for interoperability testing
>> stricter until all the vendors who were passing in the past are able to pass
>> the stricter version.
>>> No one is suggesting that a permanent, or even open-ended, exception
>>> be granted.
>> Sure, I agree an permanent or open-ended exception would be even worse. But, I
>> still think as a community we need to draw a hard line in the sand here. Just
>> because this measure is temporary doesn't make it any more palatable.
>> By doing this, even as a temporary measure, we're saying it's ok to call things
>> an OpenStack API when you add random gorp to the responses. Which is something we've
>> very clearly said as a community is the exact opposite of the case, which the
>> testing reflects. I still contend just because some vendors were running old
>> versions of tempest and old versions of openstack where their incompatible API
>> changes weren't caught doesn't mean they should be given pass now.
> Nobody is saying random gorp is OK, and I'm not sure "line in the
> sand" rhetoric is really constructive. The issue is not with the
> nature of the API policies, it's with the implementation of those
> policies and how they were rolled out.
> DefCore defines its rules using named tests in Tempest. If these
> new enforcement policies had been applied by adding new tests to
> Tempest, then DefCore could have added them using its processes
> over a period of time and we wouldn't have had any issues. That's
> not what happened. Instead, the behavior of a bunch of *existing*
> tests changed. As a result, deployments that have not changed fail
> tests that they used to pass, without any action being taken on the
> deployer's part. We've moved the goal posts on our users in a way
> that was not easily discoverable, because it couldn't be tracked
> through the (admittedly limited) process we have in place for doing
> that tracking.
> So, we want a way to get the test results back to their existing
> status, which will then let us roll adoption forward smoothly instead
> of lurching from "pass" to "fail" to "pass".
I think this is the most important thing to me as it relates to this.
I'm obviously a huge proponent of clouds behaving more samely. But I
also think that, as Doug nicely describes above, we've sort of backed in
to removing something without a deprecation window ... largely because
of the complexities involved with the system here - and I'd like to make
sure that when we are being clear about behavior changes that we give
the warning period so that people can adapt.
> We should, separately, address the process issues and the limitations
> this situation has exposed. That may mean changing the way DefCore
> defines its policies, or tracks things, or uses Tempest. For
> example, in the future, we may want tie versions of Tempest to
> versions of the trademark more closely, so that it's possible for
> someone running the Mitaka version of OpenStack to continue to use
> the Mitaka version of Tempest and not have to upgrade Tempest in
> order to retain their trademark (maybe that's how it already works?).
> We may also need to consider that test implementation details may
> change, and have a review process within DefCore to help expose
> those changes to make them clearer to deployers.
> Fixing the process issue may also mean changing the way we implement
> things in Tempest. In this case, adding a flag helps move ahead
> more smoothly. Perhaps we adopt that as a general policy in the
> future when we make underlying behavioral changes like this to
> existing tests. Perhaps instead we have a policy that we do not
> change the behavior of existing tests in such significant ways, at
> least if they're tagged as being used by DefCore. I don't know --
> those are things we need to discuss.
>> -Matt Treinish
>>  http://lists.openstack.org/pipermail/openstack-dev/2016-June/097285.html
>>>> As a user of several clouds myself I can say that having random gorp in a
>>>> response makes it much more difficult to use my code against multiple clouds. I
>>>> have to determine which properties being returned are specific to that vendor's
>>>> cloud and if I actually need to depend on them for anything it makes whatever
>>>> code I'm writing incompatible for using against any other cloud. (unless I
>>>> special case that block for each cloud) Sean Dague wrote a good post where a lot
>>>> of this was covered a year ago when microversions was starting to pick up steam:
>>>> I'd recommend giving it a read, he explains the user first perspective more
>>>> clearly there.
>>>> I believe Tempest in this case is doing the right thing from an interoperability
>>>> perspective and ensuring that the API is actually the API. Not an API with extra
>>>> bits a vendor decided to add. I don't think a cloud or product that does this
>>>> to the api should be considered an interoperable OpenStack cloud and failing the
>>>> tests is the correct behavior.
>>>> -Matt Treinish
>>>>> My reasoning behind this is that while the change that enabled strict
>>>>> checking was discussed publicly in the developer community and took
>>>>> some time to be implemented, it still landed quickly and broke several
>>>>> existing deployments overnight. As Tempest has moved forward with
>>>>> bug and UX fixes (some in part to support the interoperability testing
>>>>> efforts of the DefCore Working Group), using an older versions of Tempest
>>>>> where this strict checking is not enforced is no longer a viable solution
>>>>> for downstream deployers. The TC has passed a resolution to advise
>>>>> DefCore to use Tempest as the single source of capability testing,
>>>>> but this naturally introduces tension between the competing goals of
>>>>> maintaining upstream functional testing and also tracking lagging
>>>>> My proposal for addressing this problem approaches it at two levels:
>>>>> * For the short term, I will submit a blueprint and patch to tempest that
>>>>> allows configuration of a grey-list of Nova APIs where strict response
>>>>> checking on additional properties will be disabled. So, for example,
>>>>> if the 'create servers' API call returned extra properties on that call,
>>>>> the strict checking on this line would be disabled at runtime.
>>>>> Use of this code path will emit a deprecation warning, and the
>>>>> code will be scheduled for removal in 2017 directly after the release
>>>>> of the 2017.01 guideline. Vendors would be required so submit the
>>>>> grey-list of APIs with additional response data that would be
>>>>> published to their marketplace entry.
>>>>> * Longer term, vendors will be expected to work with upstream to update
>>>>> the API for returning additional data that is compatible with
>>>>> API micro-versioning as defined by the Nova team, and the waiver would
>>>>> no longer be allowed after the release of the 2017.01 guideline.
>>>>> For the next half-year, I feel that this approach strengthens interoperability
>>>>> by accurately capturing the current state of OpenStack deployments and
>>>>> client tools. Before this change, additional properties on responses
>>>>> weren't explicitly disallowed, and vendors and deployers took advantage
>>>>> of this in production. While this is behavior that the Nova and QA teams
>>>>> want to stop, it will take a bit more time to reach downstream. Also, as
>>>>> of right now, as far as I know the only client that does strict response
>>>>> checking for Nova responses is the Tempest client. Currently, additional
>>>>> properties in responses are ignored and do not break existing client
>>>>> functionality. There is currently little to no harm done to downstream
>>>>> users by temporarily allowing additional data to be returned in responses.
>>>>> Chris Hoge
>>>>> Interop Engineer
>>>>> OpenStack Foundation
>>>>>  https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/api-microversions.html
>>>>>  http://lists.openstack.org/pipermail/openstack-dev/2015-February/057613.html
>>>>>  https://review.openstack.org/#/c/156130
>>>>>  http://lists.openstack.org/pipermail/defcore-committee/2016-January/000986.html
>>>>>  http://git.openstack.org/cgit/openstack/defcore/tree/2015.07.json
>>>>>  http://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json
>>>>>  http://git.openstack.org/cgit/openstack/governance/tree/resolutions/20160504-defcore-test-location.rst
>>>>>  http://git.openstack.org/cgit/openstack/tempest-lib/tree/tempest_lib/api_schema/response/compute/v2_1/servers.py#n39
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev