[openstack-dev] [tempest][nova][defcore] Add option to disable some strict response checking for interop testing

Matthew Treinish mtreinish at kortar.org
Sat Jun 18 03:53:52 UTC 2016


On Fri, Jun 17, 2016 at 04:26:49PM -0700, Mike Perez wrote:
> On 15:12 Jun 14, Matthew Treinish wrote:
> > On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote:
> > > Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 -0400:
> > > > On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote:
> 
> <snip>
> 
> > > We have basically three options.
> > > 
> > > 1. Tell deployers who are trying to do the right for their immediate
> > >    users that they can't use the trademark.
> > > 
> > > 2. Flag the related tests or remove them from the DefCore enforcement
> > >    suite entirely.
> > > 
> > > 3. Be flexible about giving consumers of Tempest time to meet the
> > >    new requirement by providing a way to disable the checks.
> > > 
> > > Option 1 goes against our own backwards compatibility policies.
> > 
> > I don't think backwards compatibility policies really apply to what what define
> > as the set of tests that as a community we are saying a vendor has to pass to
> > say they're OpenStack. From my perspective as a community we either take a hard
> > stance on this and say to be considered an interoperable cloud (and to get the
> > trademark) you have to actually have an interoperable product. We slowly ratchet
> > up the requirements every 6 months, there isn't any implied backwards
> > compatibility in doing that. You passed in the past but not in the newer stricter
> > guidelines.
> > 
> > Also, even if I did think it applied, we're not talking about a change which
> > would fall into breaking that. The change was introduced a year and half ago
> > during kilo and landed a year ago during liberty:
> > 
> > https://review.openstack.org/#/c/156130/
> > 
> > That's way longer than our normal deprecation period of 3 months and a release
> > boundary.
> 
> <snip>
> 
> What kind of communication happens today for these changes? There are so many
> channels/high volume mailing lists a downstream deployer is expected by the
> community to listening in. Some disruptive change being introduced a year or
> longer ago can still be communicated poorly.

Sure, I agree with that, but I don't think this was necessarily communicated
poorly. This has been already mentioned a few times on this thread but:

It was talked about on openstack-dev:

http://lists.openstack.org/pipermail/openstack-dev/2015-February/057613.html

On the defcore list: (which is definitely not high volume/traffic ML)

http://lists.openstack.org/pipermail/defcore-committee/2015-June/000849.html

This was also raised as an issue for 1 vendor ~6 months ago. (which is also the
same duration of the hard deadline being discussed in this thread):

http://lists.openstack.org/pipermail/defcore-committee/2016-January/000986.html

IMHO, this was more than enough time to introduce a fix or workaround on their
end. Likely the easiest being just adding an extra nova-api endpoint with the
extensions disabled.

I don't have any links or other evidence to point to, but I know that this
exact topic has been discussed with with people from the vendors having
difficulties during sessions at at least one of the 2 summits and/or 2 QA
midcycle meetups since this change landed. I really don't think this is a
communication problem or unfair surprise for anyone.

There might be more too, but I don't remember every conversation that I've had
in the community over the past year. (or where to find the links to point to)

> 
> Just like we've done with Reno in communicating better about disruptive changes
> in release notes, what tells teams like DefCore about changes with Tempest?
> (I looked in release.o.o for tempest release notes, although maybe I missed
> it?)

Yes, tempest has release notes, they are here:

http://docs.openstack.org/releasenotes/tempest/

But, the change in question predates the existence of reno and centralized
release notes for everything in openstack.

If this change were pushed today it would definitely be included in the release
notes. We also would do the same things, put it on the dev list, put it on the
defcore list. (although probably as a standalone thread this time) I also think
we'd probably ping hogepodge on irc about it too just so he could also raise it
up on the defcore side. (which we might have done back then too) Defcore and
tempest are tightly coupled so we do have pretty constant communication around
changes being made. But, I do admit we have better mechanisms in place today
to communicate this kind of change, and hopefully this would be handled better
now.

> 
> Since some members of DefCore have interest in making the market place healthy,
> what is DefCore doing today to communicate these disruptive changes early to
> deployers? Did it not happen in this particular case because:
> 
> * DefCore has no one working closely in the Tempest project to flag things?
> * Defcore does work closely with Tempest, but somehow the communication for
>   this was missed?
> * Not having clear deprecation notices because release notes in the Tempest
>   don't exist (see above)?
> 
> This all just sounds like a communication problem, and it makes me sad to
> interpret this thread as people being angry with deployers as a result. How
> about we not think the worse of people that are trying to prove our project
> being successful and start working with them?

I actually don't think that's what the fundamental issue here. Chris and the
other defcore members interact quite regularly with tempest and QA teams, and
this exact change has been talked about in both circles before this thread
started. I also don't think looking at things that happened a year or more ago.
(which is ages in terms of openstack) is a particularly fair assessment. The
openstack powered program, or whatever it's officially called, was very young
back then. IIRC, it was only officially done for the first time back around
vancouver. I don't think it's right to look at things from back then and
declare there is a communication problem. It seems unfair to everyone working
in this space. The interactions between defcore and QA have only improved over
time as both teams have grown.

Also, I wouldn't say I'm angry with deployers, more like frustrated that this
discussion is still going on. It's not a new topic, it's been discussed
multiple times in the past year. This is just the first time it's been raised as
a huge problem on the dev list. (likely because the certifications from a year
ago are expiring) 

The crux of the issue here is we're saying that we want to to give the openstack
trademark to the ~3 vendors [1] that are failing the certification tests because
of proprietary, non-openstack code they're running in their products. TBH, if
that's what the foundation and the defcore committee want to do that's perfectly
fine. I don't necessarily agree with it, but I understand there are larger
politics involved and I probably don't have a complete picture. If we give these
vendors another 6 months to fix the problem that seems totally fair. Just as
long as we clearly mark how these clouds are not interoperable, this way users
can actually see what the vendors are changing.

But, I still having a hard time understanding why a workaround has to be added
in tempest to move forward here. We all seem to be in agreement that these
products don't actually pass the tests, and that tempest is doing the correct
thing and failing because the api is not actually the nova api.

It feels to me this would normally be something handled on the defcore side. But,
because the only mechanism they currently have for this is flagging a test, which
would basically mean invalidating most of the tests in defcore (especially if
the extensions are modifying a resource like servers) which is a bad idea.

I think maybe we should be discussing adding a different mechanism to the defcore
schema to special case these failures. Instead of flagging a test add a new tag,
something like 'conditional_failures_allowed: True'. Where if a product fails
this test (with the specific jsonschema exception?) it can be counted as a pass
but only if they get an asterisk on the marketplace and the incompatibilities
are documented there too.

> 
> With that said I agree with this strict checking in tests. Deployments need to
> stop defining the community defined APIs.

++

Also, as an aside, this is super basic problem for a vendor to fix/workaround, (if
it is just a case of using proprietary api extensions). Literally all they have
to do is:

 1. Copy the nova.conf from the current api server
 2. Modify the copy to not include the out of tree extensions
 3. Spin up a second nova api somewhere with that config, on the same host is
    fine just with a different port or path, and add a separate keystone catalog
    entry for it.
 4. Then change 1 line in your tempest.conf to use the new catalog_type for nova
 5. Rerun the tests

Everything should pass then. I'm not saying that's a good way to handle this
issue, but it's a quick workaround to pass while you work on solving the larger
problem.

I also do want to apologize if my words sound harsh or antagonistic to anyone, I
really don't mean for that.

-Matt Treinish

[1] Out of the ~30-40 products that have participated in the program, based on my
rough count from the marketplace: https://www.openstack.org/marketplace/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160617/876ca482/attachment.pgp>


More information about the OpenStack-dev mailing list