[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

Daryl Walleck daryl.walleck at RACKSPACE.COM
Thu Jun 16 05:40:14 UTC 2016

> -----Original Message-----
> From: GHANSHYAM MANN [mailto:ghanshyammann at gmail.com]
> Sent: Wednesday, June 15, 2016 9:59 PM
> To: OpenStack Development Mailing List (not for usage questions)
> <openstack-dev at lists.openstack.org>
> Subject: Re: [openstack-dev] [tempest][nova][defcore] Add option to
> disable some strict response checking for interop testing
> On Wed, Jun 15, 2016 at 6:12 AM, Matthew Treinish <mtreinish at kortar.org>
> wrote:
> > On Tue, Jun 14, 2016 at 12:19:54PM -0700, Chris Hoge wrote:
> >>
> >> > On Jun 14, 2016, at 11:21 AM, Matthew Treinish <mtreinish at kortar.org>
> wrote:
> >> >
> >> > On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> >> >> Last year, in response to Nova micro-versioning and extension
> >> >> updates[1], the QA team added strict API schema checking to
> >> >> Tempest to ensure that no additional properties were added to Nova
> >> >> API responses[2][3]. In the last year, at least three vendors
> >> >> participating the the OpenStack Powered Trademark program have
> >> >> been impacted by this change, two of which reported this to the
> DefCore Working Group mailing list earlier this year[4].
> >> >>
> >> >> The DefCore Working Group determines guidelines for the OpenStack
> >> >> Powered program, which includes capabilities with associated
> >> >> functional tests from Tempest that must be passed, and designated
> >> >> sections with associated upstream code [5][6]. In determining
> >> >> these guidelines, the working group attempts to balance the future
> >> >> direction of development with lagging indicators of deployments and
> user adoption.
> >> >>
> >> >> After a tremendous amount of consideration, I believe that the
> >> >> DefCore Working Group needs to implement a temporary waiver for
> >> >> the strict API checking requirements that were introduced last
> >> >> year, to give downstream deployers more time to catch up with the
> >> >> strict micro-versioning requirements determined by the
> >> >> Nova/Compute team and enforced by the Tempest/QA team.
> >> >
> >> > I'm very much opposed to this being done. If we're actually
> >> > concerned with interoperability and verify that things behave in
> >> > the same manner between multiple clouds then doing this would be a
> >> > big step backwards. The fundamental disconnect here is that the
> >> > vendors who have implemented out of band extensions or were taking
> >> > advantage of previously available places to inject extra attributes
> >> > believe that doing so means they're interoperable, which is quite
> >> > far from reality. **The API is not a place for vendor
> >> > differentiation.**
> >>
> >> Yes, it’s bad practice, but it’s also a reality, and I honestly
> >> believe that vendors have received the message and are working on
> changing.
> >
> > They might be working on this, but this change was coming for quite
> > some time it shouldn't be a surprise to anyone at this point. I mean
> > seriously, it's been in tempest for 1 year, and it took 6months to
> > land. Also, lets say we set a hard deadline on this new option to disable the
> enforcement and enforce it.
> > Then we implement a similar change on keystone are we gonna have to do
> > the same thing again when vendors who have custom things running there
> fail.
> >
> >>
> >> > As a user of several clouds myself I can say that having random
> >> > gorp in a response makes it much more difficult to use my code
> >> > against multiple clouds. I have to determine which properties being
> >> > returned are specific to that vendor's cloud and if I actually need
> >> > to depend on them for anything it makes whatever code I'm writing
> >> > incompatible for using against any other cloud. (unless I special
> >> > case that block for each cloud) Sean Dague wrote a good post where a
> lot of this was covered a year ago when microversions was starting to pick up
> steam:
> >> >
> >> > https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2
> >> > <https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2>
> >> >
> >> > I'd recommend giving it a read, he explains the user first
> >> > perspective more clearly there.
> >> >
> >> > I believe Tempest in this case is doing the right thing from an
> >> > interoperability perspective and ensuring that the API is actually
> >> > the API. Not an API with extra bits a vendor decided to add.
> >>
> >> A few points on this, though. Right now, Nova is the only API that is
> >> enforcing this, and the clients. While this may change in the future,
> >> I don’t think it accurately represents the reality of what’s
> >> happening in the ecosystem.
> >
> > This in itself doesn't make a difference. There is a disparity in the
> > level of testing across all the projects. Nova happens to be further
> > along in regards to api stability and testing things compared to a lot
> > of projects, it's not really a surprise that they're the first for
> > this to come up on. It's only a matter of time for other projects to
> > follow nova's example and implement similar enforcement.
> >
> >>
> >> As mentioned before, we also need to balance the lagging nature of
> >> DefCore as an interoperability guideline with the needs of testing
> >> upstream changes. I’m not asking for a permanent change that
> >> undermines the goals of Tempest for QA, rather a temporary upstream
> >> modification that recognizes the challenges faced by vendors in the
> >> market right now, and gives them room to continue to align themselves
> >> with upstream. Without this, the two other alternatives are to:
> >>
> >> * Have some vendors leave the Powered program unnecessarily,
> >>   weakening it.
> >> * Force DefCore to adopt non-upstream testing, either as a fork
> >>   or an independent test suite.
> >>
> >> Neither seem ideal to me.
> >
> > It might not be ideal for a vendor to leave the program, but I think
> > it's a necessary consequence of evolving the guidlines to become stricter
> over time.
> > What we define as the minimum requirements for interoperability and by
> > extension use of the trademark will continue to evolve. Every time we
> > add additional tests, more stringent checking, or change something
> > inevitably someone is going to fail no matter how slowly we ramp it out.
> >
> > There's a limit to how accommodating we should be here. This change
> > has been in the wild for a year, and also took 6 months to land. The
> > issue in question literally only ever will cause and issue if you add
> > something extra, not OpenStack, to the API. All versions of the nova
> > API (maybe not really old releases like <= folsom) should get passed
> > this check without any issue. I still fail to see how a vendor failing
> > the guidelines here is a bad thing. Isn't this what we're supposed to be
> doing.
> >
> > Also, defcore already has a mechanism for slowly rolling out changes
> > like this. The guidelines contain a tempest sha1 (for better or worse):
> >
> > https://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json#n11
> > 3
> >
> > If the defcore committee still feels there needs to be a more gradual
> > roll out of  > 1yr (which I strongly disagree with) then the minimum
> > sha1 should be set more conservatively to a point before the change in
> > question. Yes that means old bugs will still be present in tempest, but I
> don't think we can have it both ways here.
> > Either we say you have to pass stricter requirements or we don't. We
> > added idempotent ids to tempest exactly for this reason so you can
> > keep track of tests as things change.
> >
> >>
> >> One of my goals is to transparently strengthen the ties between
> >> upstream and downstream development. There is a deadline built into
> >> this proposal, and my intention is to enforce it.
> >
> > My argument is that the deadline has already passed. We've been
> > enforcing this in tempest for 1 year already. It's only coming up now
> > because some vendors didn't pay attention to anything happening in the
> > community or with changes in the testing guidelines were incoming and
> > now are stuck. From my perspective this will always happen no matter
> > how gradually we make changes and how much we advertise it.
> >
> >>
> >> > I don't think a cloud or product that does this to the api should
> >> > be considered an interoperable OpenStack cloud and failing the
> >> > tests is the correct behavior.
> >>
> >> I think it’s more nuanced than this, especially right now.
> >> Only additions to responses will be considered, not changes.
> >> These additions will be clearly labelled as variations, signaling the
> >> differences to users. Existing clients in use will not break. Correct
> >> behavior will eventually be enforced, and this would be clearly
> >> signaled by both the test tool and through the administrative
> >> program.
> >
> > You're making large assumptions about how the APIs are actually
> consumed here.
> > You can't assume that only one of the clients you know about is being
> > used to talk to APIs. For example, I have a bunch of code I wrote a
> > while ago that uses the tempest clients with [1] to interact with
> > clouds. That code would fail the second I talked to a cloud with these
> > extra bits enabled. Granted that's a bit of a contrived example, but
> > if I'm dealing with the api at a lower level (using my hypothetical
> > hand built fortran client) it's perfectly reasonable to assume that I
> > start on vendor A's "openstack" cloud see the extra params in the
> > response and assume they're everywhere and make my code depend on
> > that. Then when I use a cloud deployed on the 3 spare machines in my
> > basement from the latest release tarballs everything starts failing without
> any indication where that extra parameter went. That's the kind of
> experience we're trying to avoid.
> >
> > Also, there is also no guarantee that the extra fields are clearly
> > marked. If we disable this checking literally anything can be added to
> > the responses from nova and still pass for example if we're not
> > explicitly checking for it. For example, I could add a top level field
> > to the server response "useful: True" for things that use my
> > proprietary hypervisor and "useful: False" for libvirt guests. There
> > is nothing stopping me from writing an extension that does that and
> > adding it to the API and then passing all the tests. Nothing would catch this
> if we disable the strict validation.
> >
> > My fundamental concern here is that we're optimizing for the wrong set
> > of priorities. As a community do we want to prioritize enforcing
> > interoperability with guidelines we define and develop in the open and
> > that things that we say are openstack behave in a manner for a user as
> > we've developed in the community. Or do we want to optimize for
> > ensuring that vendors who are continually slow to adapt don't ever
> > fail guidelines when they've passed things in the past. I'm all for
> > doing a slow roll out of changes, to give people a chance to adopt as new
> constraints are added, doing otherwise would be reckless.
> > But, I feel in this case the time for that has past. I also don't
> > think we should add workarounds to avoid adding constraints as things
> > move forward, we should set reasonable min version of tempest to use.
> I too agree with that. Not allowing the addition attributes in Nova APIs has
> been since 1 year.
> If we make those configurable in Tempest and give more time frame to
> vendors to fix those then, it can give false definition of inter-operatable to
> users for that time frame.
> I like the idea of pass* or some light red color in inter-operatibility
> certification. This clearly convey to users that this Cloud is now not
> completely inter-operatable so be careful or get clarification from Cloud
> providers.

I think part of the challenge is that certification occurs on a yearly basis, so most vendors
are just hitting their first re-certification milestone. It's also been a challenge to help product
managers understand that even though no test code has changed, they are failing the same
tests that they passed a year ago.

> DefCore should mark those Cloud as non inter-operatable or with *
> whenever any Cloud failing inter-operatability testing and that can happen
> due to 1. non inter-operatable changes in Cloud 2. Change in
> testing/guidelines of inter-operatable in tempest/def-core etc.

I agree strongly that any exceptions should be stated very clearly.

> 2nd case can happen anytime like additionalProperty case  and it is good that
> we keep improving the testing.
> Not giving a green flag to Clouds in such cases, can make users/application
> developers to strongly trust on inter-opratable certificate and opposite of
> that can break/loosen the trust.
> Vendor can always provide justification to their user that what all case they
> are failing inter-operatibility and based on user use cases on their Cloud they
> can keep them happy even till Cloud gets green flag from defcore(which is
> same as gray list as proposed before).
> Also are we going to give such flexibility/timeframe in case Tempest start
> verifying other projects API in such strict manner ?
> IMO, inter-operatability certificate should be as strict as possible which can
> becomes RED anytime even by enhancement in it definition or testing by
> community.

I agree that certification should be strict, but along with that it should be explicit. I cannot
pick a random Tempest test and know that API response checking is part of the test.
It's not explained in the Tempest documentation or DefCore specification, so it's
something you can either read in email or accidentally discover in the code. As a developer
I understand that this implementation of response checking was the most convenient, but
it makes describing what an individual test validates more complex.

> Config Option can effect Tempest reliability (for production env testing etc) :
>     Another point I always think from Tempest pov is that Tempest is being
> used on production Cloud testing and if we provide option to disable the
> additional property(even for short term) it means we are providing a way to
> Cloud tester to use Tempest as weak testing.
> Tempest should be more trust worthy in their testing and not to provide any
> kind of way which can be used wrongly to weak the Tempest testing
> guarantee.

As a tester, I think it is reasonable to want to run a certain subset of tests.
This already happens in most gate jobs with regexes that only pick tests
that work with a specific hypervisor, drivers or other factor that may impact
functionality of the project. Because there are not actual tests or configuration
options for response checking, there is no choice. Rather than running a subset
of Tempest tests, the alternative is that you can't run any Tempest tests.

To be clear, I'm not arguing against response checking. Interop is an important
issue to address for the reasons others have already stated. However, going
back to Chris's original point, even if there was agreement to allow any exceptions,
the capability to do so does not exist without modifying the Tempest source code.
As someone trying to work through these issues with internal teams, it would be
extremely valuable to have the ability to show the with/without API response
checking diff results to my teams without having to maintain branches to show
various scenarios.



> Thanks
> gmann
> >
> > -Matt Treinish
> >
> > [1] https://github.com/mtreinish/mesocyclone
> >
> >
> >
> >
> __________________________________________________________
> ____________
> > ____ OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> __________________________________________________________
> ________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list