[openstack-dev] [nova] Future of the Nova API

Christopher Yeoh cbkyeoh at gmail.com
Tue Feb 25 12:00:44 UTC 2014


On Tue, 25 Feb 2014 10:31:42 +0000
John Garbutt <john at johngarbutt.com> wrote:

> On 25 February 2014 06:11, Christopher Yeoh <cbkyeoh at gmail.com> wrote:
> > On Mon, 24 Feb 2014 17:37:04 -0800
> > Dan Smith <dms at danplanet.com> wrote:
> >
> >> > onSharedStorage = True
> >> > on_shared_storage = False
> >>
> >> This is a good example. I'm not sure it's worth breaking users _or_
> >> introducing a new microversion for something like this. This is
> >> definitely what I would call a "purity" concern as opposed to
> >> "usability".
> 
> I thought micro versioning was so we could make backwards compatible
> changes. If we make breaking changes we need to support the old and
> the new for a little while.

Isn't the period that we have to support the old and the new for these
sorts of breaking the changes exactly the same period of time that we'd
have to keep V2 around if we released V3? Either way we're forcing
people off the old behaviour. 

> I am tempted to say the breaking changes just create a new extension,
> but there are other ways...

Oh, please no :-) Essentially that is no different to creating a new
extension in the v3 namespace except it makes the v2 namespace even
more confusing?

> For return values:
> * get new clients to send Accepts headers, to version the response
> * this amounts to the "major" version
> * for those request the new format, they get the new format
> * for those getting the old format, they get the old format
> 
> For this case, on requests:
> * we can accept both formats, or maybe that also depends on the
> Accepts headers (with is a bit funky, granted).
> * only document the new one
> * maybe in two years remove the old format? maybe never?
> 

So the idea of accept headers seems to me like just an alternative to
using a different namespace except a new namespace is much cleaner.

> Same for URLs, we could have the old a new names, with the new URL
> always returning the new format (think instace_actions ->
> server_actions).
> 
> If the code only differers in presentation, that implies much less
> double testing that two full versions of the API. It seems like we
> could make some of these clean ups in, and keep the old version, with
> relatively few changes.

As I've said before the API layer is very thin. Essentially most of it
is just about parsing the input, calling something, then formatting the
output. But we still do double testing even though the difference
between them most of the time is just "presentation".  Theoretically if
the unittests were good enough in terms of checking the API we'd only
have to tempest test a single API but I think experience has shown that
we're not that good at doing exhaustive unittests. So we use the
fallback of throwing tempest at both APIs

> We could port the V2 classes over to the V3 code, to get the code
> benefits.

I'm not exactly sure what you mean here. If you mean backporting say
the V3 infrastructure so V2 can use it, I don't want people
underestimating the difficulty of that. When we developed the new
architecture we had the benefit of being able to bootstrap it without
it having to work for a while. Eg. getting core bits like servers and
images up and running without having to have the additional parts which
depend on it working with it yet. With V2 we can't do that, so
operating on a "active" system is going to be more difficult. The CD
people will not be happy with breakage :-)

But even then it took a considerable amount of effort - both coding and
review to get the changes merged, and that was back in Havana when it
was easier to review bandwidth. And we also discovered that especially
with that sort of infrastructure work its very difficult to get many
people working parallel - or even one person working on too many things
at one time. Because you end up in merge confict/rebase hell. I've been
there a lot in Havana and Icehouse.

> Return codes are a bit harder, it seems odd to change those based on
> Accepts headers, but maybe I could live with that.
> 
> 
> Maybe this is the code mess we were trying to avoid, but I feel we
> should at least see how bad this kind of approach would look?

So to me this approach really doesn't look a whole lot different to
just having a separate v2/v3 codebase in terms of maintenance. LOC
would be lower, but testing load is similar if we make the same sorts
of changes. Some things like input validation are a bit harder to
implement (because you need quite lax input validation for v2-old and
strict for v2-new).

Also how long are we going to spend on this sort of exploratory work?
The longer we take on it, the more we risk V3 slipping in Juno if we
take that route.

If we really need a super long deprecation period for V2 I'm going to
suggest again the idea of V2 proxy which translates to V3 speak and does
the necessary proxying. From a testing point of view we'd only need to
test that input and output of the proxy (ie correct V3 code requests are
emitted and correct V2 output is returned). And we already have tempest
tests for V2 which we could use for more general correctness (at least
a first for sanity checking). Its not ideal and there's probably some
compromise we'd have to make on the V3 input validation around names of
things, but otherwise should work. And it does allow us to pull the V2
code out of the tree earlier (say after just 2 cycles after V3 is
released which gives us enough ramp up time to get the proxy working).

> I agree its a mess.
> 
> But rather than fork the code, can we find a better way of supporting
> the old an new versions on a single (ideally cleaner) code base?

So I guess I keep coming back to repeating that the API layer is really
thin. Its main purpose being to just parse the incoming data and format
the outgoing. In most extensions there is actually very little actual
logic inside it - its an abstraction layer which allows us to fiddle
with the internals without exposing them to the API. So the gain you
get from having to support two version of the API are small, if not
negative because the code itself is more complex (and you risk
accidental interaction between parsing for v2 and v3). 

Also in terms of testing I don't think you save a lot - perhaps a
little bit on unittests - but not much since much of the API unit tests
is meant to be about testing the api parsing and output rather than
testing what is underneath so you need to test all the possible code
paths anyway.

For tempest testing perhaps you could say well most of it is the same,
we don't need to test both, but that's pretty much true for v2 and v3
as it is anyway as fundamentally both apis still call the underlying
code the same way. Tempest is a sanity check we'd still want in both
cases regardless.

> So users can migrate in their own timeframe, and we don't get a
> complete maintenance nightmare in the process.

So I'd like to tease out a bit more what the maintenance concerns are.
As I seem them the overheads are:

- Tempest tests. Worst case we double the amount of testing the Nova
  requires (in practice its less than this because the v3 API is quite
  a bit smaller than the v2 API since we can drop the cruft we don't
  want to support in the future).

  Personally I think this is the worst part. There's also the
  duplicated tests, though I think if we really tried we could probably
  share more test code between the two. I didn't think it was worth it
  if we're only keeping the v2 API for a 2-3 cycles after v3 release
  (and being resource constrained getting some sanity checking for the
  V3 API was more important), but if we're doing it for longer then it
  may well be. The recent decision to remove XML will also make this
  much easier.

- Internal changes needing corresponding changes to v2, v3 and ec2
  apis. Doing objects and the v3 API at the same time definitely hurt.
  We merged conflicted a lot. But with objects in place I think this
  overhead is actually now quite small. We have oh so many layers of
  abstraction now that the vast majority of changes to Nova won't need
  to fiddle with the API. And although when changes need to be made,
  its normally just a trivial change required, albeit needing to be
  done in 2-3 places instead of 1-2.

- Unit tests. This is non trivial, but there's likely to be a lot of
  code duplication we can collapse (even within the v2 and v3 APIs
  unittests there's a lot of duplicated code that could be removed and
  I suspect if we tried we could share it between v2 and v3). There'd
  be a bunch of refactoring work required mostly around tests being
  able to more generically take input and more generically test output.
  So its not easy, but we could cut down on the overhead there.

So I think this is all hard to quantify but I don't think its as big as
people fear - I think the tempest overhead is the main concern because
it maps to extra check/gate resources but if we want backwards
incompatible changes we get that regardless. I really don't see the
in-tree nova overhead as that significant - some of it comes down just
to reviewers asking if a change is made to v2 does it need to be done to
ec2/v3 as well?

So I think we come back to our choices:

- V2 stays forever. Never any backwards incompatible changes. For lots
  of reasons I've mentioned before don't like it. Also the longer we
  delay the harder it gets.

- V2 with V3 backport incorporating changes into V2. Probably less
  tempest testing load depending on how much is backported. But a *lot*
  of backporting working. It took us 2 cycles to get V3 to this stage,
  it'd be 3 in the end if we the release V3 in Juno. How many cycles
  would it take us to implement V3 changes in the V2 code? And in many
  cases its not a matter of just backporting patches, its starting from
  scratch. And we don't have a clear idea of when we can deprecate the
  V2 part of the code (the cleanup of which will be harder than just
  removing everything in the contrib directory ;-)

- Release V3. But we don't know how long we have to maintain V2 for.
  But if its just two years after the V3 release I think its a
  no-brainer that we just go the V3 route. If its 7 or 10 years then I
  think we'll probably find it hard to justify any backwards
  incompatible change and that will make me very sad given the state of
  the V2 API. (And as an aside if we suspect that "never deprecate" is
  the answer I think we should defer all the pending new API extensions
  in the queue for V2 - because we haven't done a sufficient evaluation
  of them and we'll have to live with what they do forever)

Whatever we decide I think its clear we need to be much much more
stricter about what new APIs we allow in and any really changes at all
to the API. Because we're stuck with the consequences for a very long
time. There's a totally different trade off between speed of
development and long term consequences if you make a mistake compared
to the rest of Nova.

Chris



More information about the OpenStack-dev mailing list