Re: [all][interop][cinder][qa] API changes with/without microversion and Tempest verification of API interoperability

17 Sep 2019

      ---- On Tue, 17 Sep 2019 06:22:09 -0700 Eric Harney <eharney@redhat.com> wrote ----
...
On 9/16/19 8:01 PM, Ghanshyam Mann wrote:
...
---- On Tue, 17 Sep 2019 02:40:36 +0900 Eric Harney <eharney@redhat.com> wrote ----
...
On 9/16/19 6:02 AM, Ghanshyam Mann wrote:
...
---- On Mon, 16 Sep 2019 18:53:58 +0900 Ghanshyam Mann <gmann@ghanshyammann.com> wrote ----
...
Hello Everyone,
As per discussion over ML, Tempest started the JSON schema strict validation for Volume  APIs response [1].
Because it may affect the interop certification, it was explained to the Interop team as well as in the Board of Director meeting[2].
In Train, Tempest started implementing the validation and found an API change where the new field was added in API response without versioning[3] (Cinder has API microversion mechanism). IMO, that was not the correct way to change the API and as per API-WG guidelines[4] any field added/modified/removed in API should be with microverison(means old versions/user should not be affected by that change) and must for API interoperability.
With JSON schema validation, Tempest verifies the API interoperability recommended behaviour by API-WG.  But as per IRC conversion with cinder team, we have different opinion on API interoperability and how API should be changed under microversion mechanism.  I would like to have a conclusion on this so that Tempest can verify or leave the Volume API for strict validation.
I found the same flow chart what Sean created in Nova about "when to bump microverison" in Cinder also which clearly say any addition to response need new microversion.
- https://docs.openstack.org/cinder/latest/contributor/api_microversion_dev.ht...
-gmann
I don't believe that it is clear that a microversion bump was required
for the "groups" response showing up in a GET quota-sets response, and
here's why:
This API has, since at least Havana, returned dynamic fields based on
quotas that are assigned to volume types.  i.e.:
$ cinder --debug quota-show b73b1b7e82a247038cd01a441ec5a806
DEBUG:keystoneauth:RESP BODY: {"quota_set": {"per_volume_gigabytes": -1,
"volumes_ceph": -1, "groups": 10, "gigabytes": 1000, "backup_gigabytes":
1000, "snapshots": 10, "volumes_enc": -1, "snapshots_enc": -1,
"snapshots_ceph": -1, "gigabytes_ceph": -1, "volumes": 10,
"gigabytes_enc": -1, "backups": 10, "id":
"b73b1b7e82a247038cd01a441ec5a806"}}
"gigabytes_ceph" is in that response because there's a "ceph" volume
type defined, same for "gigabytes_enc", etc.
This puts this API alongside something more like listing volume types --
you get a list of what's defined on the deployment, not a pre-baked list
of defined fields.
Complaints about the fact that "groups" being added without a
microversion imply that these other dynamic fields shouldn't be in this
response either -- but this is how this API works.
There's a lot of talk here about interoperability problems... what are
those problems, exactly?  If we ignore Ocata and just look at Train --
why is this API not problematic for interoperability there, when
requests on different clouds would return different data, depending on
how types are configured?
It's not clear to me that rectifying the microversion concerns around
the "groups" field is helpful without also understanding this piece,
because if the concern is that different clouds return different fields
for this API -- that will still happen.  We need more detail to
understand how to address this, and what the problem is that we are
trying to solve exactly.
There are two things here.
1. API behaviour depends on backend. This has been discussed two years back also and Tempest team along with cinder team decided not to test the backend-specific behaviour in Tempest[1].
This is wrong.  Nothing about what is happening in this API is 
backend-specific.
I agree and I am just giving the background information about backend specific features/API behaviour testing.
...
...
2. API is changed without versioning.
The second one is the issue here. If any API is changed without versioning cause the interoperability issue here. New field is being added for older microversion also for same backend.
If the concern is that different fields can be returned as part of quota 
info, it's worth understanding that fixing the Ocata tempest failures 
won't fix your concern, because this API still returns dynamic fields 
when the deployment is using per-type quotas, even on master.
Are those considered "changes"?  Need concrete details here.
+1. ocata fix is separate from this discussion. That has to be done at some time whenever Tempest starts failing on Ocata for any reason.
...
...
*Why this is interoperability:
CloudA with same configuration and same backend is upgraded and have API return new field. I deploy my app on that cloud and use that field. Now CloudB with same configuration and same backend is not upgraded yet so does not have API return the new field added. Now I want to move my app from CloudA to CloudB and it will fail because CloudB API does not have that new field. And I cannot check what version it got added or there is no mechanism for app to discover that field as expected in which Cloud.
So this is a very clear case of interoperability.
There is no way for end-user to discover the API change which is a real pain point for them. Note: same backend and same configuration cloud have different behaviour of API.
We should consider the addition of new field same as delete or modify (name or type) any field in API.
This seems to imply that the whole Cinder per-type quota feature is 
invalid, or implemented in an invalid way.  Is the concern about how 
things are expressed in the API, or the broader features?
I am also concern about the general rule for Cinder API which is important to consider for Tempest BP of strict volume validation[1]

let me ask clearly: 
1. does Cinder consider adding any new field to any API does not need microversion bump as opposit to this doc[2] ?

2. Or it is only quota API where additional to the new field is considered as no microversion bump?  if so how many such dynamic  APIs Cinder has ?

[1] https://blueprints.launchpad.net/tempest/+spec/volume-response-schema-valida...
[2] https://docs.openstack.org/cinder/latest/contributor/api_microversion_dev.ht...   https://specs.openstack.org/openstack/api-wg/guidelines/api_interoperability...

-gmann
...
...
...
(Other than the problem that Tempest currently fails on Ocata.  My
inclination is still that the Tempest tests could just be wrong.)
Ocata gate is going to be solved by https://review.opendev.org/#/c/681950/
Fixing Ocata is great, but I'd like to settle the bigger questions about 
this API that you are raising.
I'd prefer to not end up worrying about these same problems the next 
time someone writes tests for this API, or makes a change to it.
What would be a valid way to design it that meet the concerns around 
interop?
...
-gmann
[1] http://lists.openstack.org/pipermail/openstack-dev/2017-May/116172.html
...
...
...
[1] http://lists.openstack.org/pipermail/openstack-discuss/2018-November/000358....
[2]
- http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003652.htm...
- http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003655.htm...
[3] https://bugs.launchpad.net/tempest/+bug/1843762   https://review.opendev.org/#/c/439461/
[4] https://specs.openstack.org/openstack/api-wg/guidelines/api_interoperability...
-gmann