[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

GHANSHYAM MANN ghanshyammann at gmail.com
Thu Jun 16 02:59:13 UTC 2016

On Wed, Jun 15, 2016 at 6:12 AM, Matthew Treinish <mtreinish at kortar.org> wrote:
> On Tue, Jun 14, 2016 at 12:19:54PM -0700, Chris Hoge wrote:
>> > On Jun 14, 2016, at 11:21 AM, Matthew Treinish <mtreinish at kortar.org> wrote:
>> >
>> > On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
>> >> Last year, in response to Nova micro-versioning and extension updates[1],
>> >> the QA team added strict API schema checking to Tempest to ensure that
>> >> no additional properties were added to Nova API responses[2][3]. In the
>> >> last year, at least three vendors participating the the OpenStack Powered
>> >> Trademark program have been impacted by this change, two of which
>> >> reported this to the DefCore Working Group mailing list earlier this year[4].
>> >>
>> >> The DefCore Working Group determines guidelines for the OpenStack Powered
>> >> program, which includes capabilities with associated functional tests
>> >> from Tempest that must be passed, and designated sections with associated
>> >> upstream code [5][6]. In determining these guidelines, the working group
>> >> attempts to balance the future direction of development with lagging
>> >> indicators of deployments and user adoption.
>> >>
>> >> After a tremendous amount of consideration, I believe that the DefCore
>> >> Working Group needs to implement a temporary waiver for the strict API
>> >> checking requirements that were introduced last year, to give downstream
>> >> deployers more time to catch up with the strict micro-versioning
>> >> requirements determined by the Nova/Compute team and enforced by the
>> >> Tempest/QA team.
>> >
>> > I'm very much opposed to this being done. If we're actually concerned with
>> > interoperability and verify that things behave in the same manner between multiple
>> > clouds then doing this would be a big step backwards. The fundamental disconnect
>> > here is that the vendors who have implemented out of band extensions or were
>> > taking advantage of previously available places to inject extra attributes
>> > believe that doing so means they're interoperable, which is quite far from
>> > reality. **The API is not a place for vendor differentiation.**
>> Yes, it’s bad practice, but it’s also a reality, and I honestly believe that
>> vendors have received the message and are working on changing.
> They might be working on this, but this change was coming for quite some
> time it shouldn't be a surprise to anyone at this point. I mean seriously, it's
> been in tempest for 1 year, and it took 6months to land. Also, lets say we set
> a hard deadline on this new option to disable the enforcement and enforce it.
> Then we implement a similar change on keystone are we gonna have to do the same
> thing again when vendors who have custom things running there fail.
>> > As a user of several clouds myself I can say that having random gorp in a
>> > response makes it much more difficult to use my code against multiple clouds. I
>> > have to determine which properties being returned are specific to that vendor's
>> > cloud and if I actually need to depend on them for anything it makes whatever
>> > code I'm writing incompatible for using against any other cloud. (unless I
>> > special case that block for each cloud) Sean Dague wrote a good post where a lot
>> > of this was covered a year ago when microversions was starting to pick up steam:
>> >
>> > https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2 <https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2>
>> >
>> > I'd recommend giving it a read, he explains the user first perspective more
>> > clearly there.
>> >
>> > I believe Tempest in this case is doing the right thing from an interoperability
>> > perspective and ensuring that the API is actually the API. Not an API with extra
>> > bits a vendor decided to add.
>> A few points on this, though. Right now, Nova is the only API that is
>> enforcing this, and the clients. While this may change in the
>> future, I don’t think it accurately represents the reality of what’s
>> happening in the ecosystem.
> This in itself doesn't make a difference. There is a disparity in the level of
> testing across all the projects. Nova happens to be further along in regards
> to api stability and testing things compared to a lot of projects, it's not
> really a surprise that they're the first for this to come up on. It's only a
> matter of time for other projects to follow nova's example and implement similar
> enforcement.
>> As mentioned before, we also need to balance the lagging nature of
>> DefCore as an interoperability guideline with the needs of testing
>> upstream changes. I’m not asking for a permanent change that
>> undermines the goals of Tempest for QA, rather a temporary
>> upstream modification that recognizes the challenges faced by
>> vendors in the market right now, and gives them room to continue
>> to align themselves with upstream. Without this, the two other
>> alternatives are to:
>> * Have some vendors leave the Powered program unnecessarily,
>>   weakening it.
>> * Force DefCore to adopt non-upstream testing, either as a fork
>>   or an independent test suite.
>> Neither seem ideal to me.
> It might not be ideal for a vendor to leave the program, but I think it's a
> necessary consequence of evolving the guidlines to become stricter over time.
> What we define as the minimum requirements for interoperability and by extension
> use of the trademark will continue to evolve. Every time we add additional tests,
> more stringent checking, or change something inevitably someone is going to fail
> no matter how slowly we ramp it out.
> There's a limit to how accommodating we should be here. This change has been in
> the wild for a year, and also took 6 months to land. The issue in question
> literally only ever will cause and issue if you add something extra, not
> OpenStack, to the API. All versions of the nova API (maybe not really old
> releases like <= folsom) should get passed this check without any issue. I still
> fail to see how a vendor failing the guidelines here is a bad thing. Isn't this
> what we're supposed to be doing.
> Also, defcore already has a mechanism for slowly rolling out changes like this. The
> guidelines contain a tempest sha1 (for better or worse):
> https://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json#n113
> If the defcore committee still feels there needs to be a more gradual roll out of
>  > 1yr (which I strongly disagree with) then the minimum sha1 should be set more
> conservatively to a point before the change in question. Yes that means old bugs
> will still be present in tempest, but I don't think we can have it both ways here.
> Either we say you have to pass stricter requirements or we don't. We added
> idempotent ids to tempest exactly for this reason so you can keep track of tests
> as things change.
>> One of my goals is to transparently strengthen the ties between
>> upstream and downstream development. There is a deadline
>> built into this proposal, and my intention is to enforce it.
> My argument is that the deadline has already passed. We've been enforcing this
> in tempest for 1 year already. It's only coming up now because some vendors didn't
> pay attention to anything happening in the community or with changes in the
> testing guidelines were incoming and now are stuck. From my perspective this
> will always happen no matter how gradually we make changes and how much we
> advertise it.
>> > I don't think a cloud or product that does this
>> > to the api should be considered an interoperable OpenStack cloud and failing the
>> > tests is the correct behavior.
>> I think it’s more nuanced than this, especially right now.
>> Only additions to responses will be considered, not changes.
>> These additions will be clearly labelled as variations,
>> signaling the differences to users. Existing clients in use
>> will not break. Correct behavior will eventually be enforced,
>> and this would be clearly signaled by both the test tool and
>> through the administrative program.
> You're making large assumptions about how the APIs are actually consumed here.
> You can't assume that only one of the clients you know about is being used to
> talk to APIs. For example, I have a bunch of code I wrote a while ago that uses
> the tempest clients with [1] to interact with clouds. That code would fail the
> second I talked to a cloud with these extra bits enabled. Granted that's a bit
> of a contrived example, but if I'm dealing with the api at a lower level (using
> my hypothetical hand built fortran client) it's perfectly reasonable to assume
> that I start on vendor A's "openstack" cloud see the extra params in the
> response and assume they're everywhere and make my code depend on that. Then
> when I use a cloud deployed on the 3 spare machines in my basement from the
> latest release tarballs everything starts failing without any indication where
> that extra parameter went. That's the kind of experience we're trying to avoid.
> Also, there is also no guarantee that the extra fields are clearly marked. If we
> disable this checking literally anything can be added to the responses from nova
> and still pass for example if we're not explicitly checking for it. For example,
> I could add a top level field to the server response "useful: True" for things
> that use my proprietary hypervisor and "useful: False" for libvirt guests. There
> is nothing stopping me from writing an extension that does that and adding it to
> the API and then passing all the tests. Nothing would catch this if we disable
> the strict validation.
> My fundamental concern here is that we're optimizing for the wrong set of
> priorities. As a community do we want to prioritize enforcing interoperability
> with guidelines we define and develop in the open and that things that we
> say are openstack behave in a manner for a user as we've developed in the
> community. Or do we want to optimize for ensuring that vendors who are
> continually slow to adapt don't ever fail guidelines when they've passed things
> in the past. I'm all for doing a slow roll out of changes, to give people a
> chance to adopt as new constraints are added, doing otherwise would be reckless.
> But, I feel in this case the time for that has past. I also don't think we should
> add workarounds to avoid adding constraints as things move forward, we should set
> reasonable min version of tempest to use.

I too agree with that. Not allowing the addition attributes in Nova
APIs has been since 1 year.
If we make those configurable in Tempest and give more time frame to
vendors to fix those then, it can give false definition of
inter-operatable to users for that time frame.
I like the idea of pass* or some light red color in
inter-operatibility certification. This clearly convey to users that
this Cloud is now not completely inter-operatable so be careful or get
clarification from Cloud providers.

DefCore should mark those Cloud as non inter-operatable or with *
whenever any Cloud failing inter-operatability testing and that can
happen due to
1. non inter-operatable changes in Cloud
2. Change in testing/guidelines of inter-operatable in tempest/def-core etc.

2nd case can happen anytime like additionalProperty case  and it is
good that we keep improving the testing.
Not giving a green flag to Clouds in such cases, can make
users/application developers to strongly trust on inter-opratable
certificate and opposite of that can break/loosen the trust.

Vendor can always provide justification to their user that what all
case they are failing inter-operatibility and based on user use cases
on their Cloud they can keep them happy even till Cloud gets green
flag from defcore(which is same as gray list as proposed before).

Also are we going to give such flexibility/timeframe in case Tempest
start verifying other projects API in such strict manner ?

IMO, inter-operatability certificate should be as strict as possible
which can becomes RED anytime even by enhancement in it definition or
testing by community.

Config Option can effect Tempest reliability (for production env testing etc) :
    Another point I always think from Tempest pov is that Tempest is
being used on production Cloud testing and if we provide option to
disable the additional property(even for short term) it means we are
providing a way to Cloud tester to use Tempest as weak testing.
Tempest should be more trust worthy in their testing and not to
provide any kind of way which can be used wrongly to weak the Tempest
testing guarantee.


> -Matt Treinish
> [1] https://github.com/mtreinish/mesocyclone
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list