[openstack-dev] [all] [tc] [api] refreshing and revalidating api compatibility guidelines

Monty Taylor mordred at inaugust.com
Wed Jan 25 14:16:44 UTC 2017


On 01/24/2017 12:39 PM, Chris Dent wrote:
> On Mon, 23 Jan 2017, Sean Dague wrote:
> 
>> We all inherited a bunch of odd and poorly defined behaviors in the
>> system we're using. They were made because at the time they seemed like
>> reasonable tradeoffs, and a couple of years later we learned more, or
>> needed to address a different use case that people didn't consider
>> before.
> 
> Thanks, as usual, for providing some well considered input Sean. I
> think it captures well what we could describe as the "nova
> aspirational model for managing change" which essentially means:
> 
> * don't change stuff unless you have to
> * when you do change stuff, anything, use microversions to signal
> 
> This is a common position and I suspect if we were to use the
> voices that have spoken up so far to form the new document[1] then
> it would codify that, including specifying microversions as the
> technology for managing boundaries.

I have quibble with the current microversions construct. It's mostly
semantic in nature, and I _think_ it's not valid/useful - but I'm going
to describe it here just so that I've said it and we can all acknowledge
it and move on.

My concern is with the prefix "micro". What gets presented to the user
now is a "major" api version that is essentially useless, and a
monotonoically increasing single version number that does not indicate
whether a given version introduced a breaking change or not.

I LIKE the mechanism. It works well - I do not think using it is
burdensome or bad for the user so far. But it's not "micro". It's
_essentially_ "every 'microversion' bump must be treated as a major
version bump, we just moved it to a construct that doesn't involve
deploying 40 different rest endpoints.

There are ways in which we could use the mechanism while still using
structured content to convey some amount of meaning to a user so that
client consumers don't have to write matrixes of "if this cloud has max
microversion of 27, then do this, otherwise do this other thing" for all
of the microversions.

That said - it's WAY better than the other thing - at least so far in
the way I'm seeing nova use it.

So I imagine it's just me quibbling over the word 'micro' and wanting
something more like libtool's version:revision:age construct which
calculates for a given library and consumer whether or not a library can
be expected to be usable in a dynamic linking context. (this is a
different construct from semver, but turns out is handy when you have a
single client that may need to consume multiple different api providers)

> That could very well be fine, but we have evidence that:
> 
> * some projects don't yet use microversions in their APIs
> * some projects have no intention of using microversions or at least
>   have internal conflict about doing so
> * some projects would like to change things (irrespective of
>   microversions)
> 
> What do we do about that? That's what I think we could be working
> out here, and why I'm persisting in dragging this out. There's no
> point making rules that a significant portion of the populace have
> no interest in following.
> 
> So the options seem to be:
> 
> * codify the two rules above as the backbone for the
>   api-compatibility assertion tag and allow several projects to not
>   assert that, despite an overall OpenStack goal

I like the two rules above. They serve end users in the way Sean is
talking about better than any of the alternatives I've heard.

> * keep hashing things out for a bit longer until either we have
>   different rules so we have more projects liking the rules or we
>   justify the rules until we have more projects accepting them
> 
> More in response to Sean below, not to contradict what he's saying
> but in the ever-optimistic hope of continuing and expanding the
> conversation to get real rather than enforced consensus.
> 
>> If you don't guaruntee that existing applications will work in the
>> future (for some reasonable window of time), it's a massive turn off to
>> anyone deciding to use this interface at all. You suppress your user
>> base.
> 
> I think "reasonable window of time" is a key phrase here that
> perhaps we can build into the guidelines somewhat. The problems of
> course are that some clouds will move forward in time at different
> rates and as Sean has frequently pointed out, time's arrow is not
> unidirectional in the universe of many OpenStack clouds.
> 
> To what extent is the HEAD of OpenStack responsible to OpenStack two
> or three years back?

I personally be the answer to this is "forever" I know that's not
popular - but if we don't, someone _else_ has to deal with making sure
code that wants to consume new apis and also has to talk to older
OpenStack installations can do that.

But it turns out OpenStack works way better than our detractors in the
"success is defined by the size of your VC intake" tech press like to
admit - and we have clouds _today_ that are happily running in
production with Juno pre-nova-microversions. (I'm consuming at least one
of them in production) That means that as long ago as Juno, OpenStack
worked well enough for at least some people that they have not upgraded.
Each release after Juno has gotten WAY better, so the ability cloud
providers will have to install once and just kind of chill will only
increase over time. The world's hipsters tell us that EVERYONE wants to
MOVE FAST ALL THE TIME but that's simply not true. Some people do - some
people don't - and if we've done a good job, we will have empowered
people to make the choice to move slowly as well as to move fastly.

That is to say - we have already lost the ability to assert that
everyone should upgrade all the time.

> Also, when suppressing or not suppressing which user base is more
> important? The users that exist now or the users to come? This may
> sound like a snarky or idle question, but it's a real one: Is it
> true that we do, as a general rule, base our development on existing
> users and not people who have chosen not to use "the product" for
> some reason?

We have a GIANT install base - but the tools that can work consistently
across that install base is small. If we continue to chase phantom maybe
users at the expense of the users we have currently, I'm pretty sure
we'll end up where linux on the desktop has. I believe we stopped be
able to legitimately make backwards incompatible change around havana.

>> This is a real issue. A real issue raised by users and other project
>> teams. I do understand that in other contexts / projects that people
>> have been involved in, this may not have been considered an issue. But I
>> would assert it is one here.
> 
> I don't think anyone disagrees with it being a real issue. Perhaps
> it would be more correct to say "I agree with your assertion". I
> also, however, assert that we can learn from other approaches. Not
> so that we can use different approaches, but so that we can clarify
> and evolve the approaches we do use so that people more fully
> understand the reasons, edge cases, etc. For some the problem (and
> solutions) are very well understood and accepted, for others not so
> much. The compare and constrast technique is a time honored and
> tested way of expanding the mind.
> 
>> So before reopening the exploration of approaches (or the need to do
>> anything at all), we should probably narrow the focus of whether
>> guaruntees to the user that their existing code will continue to work is
>> something that we need / want. I don't see any new data coming into our
>> community that this is less important than it was 4 years ago.
> 
> But we do have some data (recent glance visibility situation) that
> sometimes changing stuff that violates the letter of the law (but
> not really the spirit?) causes indecision and confusion when
> evaluating changes. Are we going to declare this okay because glance
> doesn't (and can't (yet) if we assert microversions) assert api
> stability support?

I declare glance ok because the end result of that incompatible change
was a better user experience, it doensn't fail open in a way that is
super hard to deal with - and more importantly because glance users at
the moment still  have to deal with the v1/v2 split and the PUT/tasks
split and the v2 removal of import_from_url and the upcoming improved
image upload. While those may have been coded years ago, they are still
VERY active issues for glance consumers currently.

In other words, "It's the fall that'll kill you"

Or, in other words - because v1 is still very much extant, code dealing
with image visibility pretty much looks like:

        visibility = image.pop('visibility', None)
        if visibility:
            is_public = (visibility == 'public')
        else:
            is_public = image.pop('is_public', False)
            visibility = 'public' if is_public else 'private'

Because it has to be because of how v1 and v2 are different. Guess what?
That code is not broken by the glance change. :)

So I do think it's important to not break backwards compat - but
understanding backwards compat not in a vacuum but in the context of the
actual current existing api compatability landscape is super important.

> We also have endless data that changing APIs is part and parcel of
> what we do and that change of any kind is part and parcel of living
> in the real world. I think even if we are maintaining that backwards
> stability is critical we need to think about the cognitive cost of
> multiple microversions to users. They are under no obligation to use
> the new features or the bug fixes, but they do represent fairly
> constant change that you only get access to if you choose to be
> aware of microversions or use particular versions of supplied
> clients.

Finding this:

http://docs.openstack.org/developer/nova/api_microversion_history.html

Is hard. I saw it for the first time 3 days ago. Know why? It's in the
nova developer docs, not in the API docs. It's a great doc.

But in terms of cognitive cost, so far I've been using 'latest' for
everything and putting in defensive coding checks and traps (doing gets
with default values, trying a call and failing graceully) As I move code
I'm working on from novaclient to REST it's possible I may find a
situation where it's less cognitive load for me to actively assert a
microversion number - but I have not yet found one of those.

I guess in summary - whatever we decide here, I still personally have to
suport nova v2.0 and glance v2 and keystone v2 anyway. I know we'd all
like those things to go away, but from a consume perspective, I can't
ever drop support for them. As we deprecate that code from OpenStack
it'll be harder and harder for me to do direct integration testing, but
thankfully I have requests_mock now.

> A rough calculation suggests the compute API went through 25
> microversions in 2016. A part of this is because each individual
> feature or fix is a new version (a good thing!) instead anything
> resembling a composed release, but it is still a lot of change. Your
> old school HTTP API person would say "make your resources, make your
> representations, stop".  We're not like that. And presumably can't
> be like that, but it is something to think about.
> 
> Without further input, I'll make a pass by the end of the week at
> codifying what's described at the start of this message in the
> review but I would prefer to see more input first. Not because I
> think we'll end up with something different (though we may) but
> because more input from more than the usual suspects is a good
> thing.

Thank you for tackling this topic - and for potentially reading my
rambling and probably contradictory thoughts.




More information about the OpenStack-dev mailing list