Open Stack

Tue Feb 25 21:47:08 UTC 2014

On 2/25/2014 6:00 AM, Christopher Yeoh wrote:
> On Tue, 25 Feb 2014 10:31:42 +0000
> John Garbutt <john at johngarbutt.com> wrote:
>
>> On 25 February 2014 06:11, Christopher Yeoh <cbkyeoh at gmail.com> wrote:
>>> On Mon, 24 Feb 2014 17:37:04 -0800
>>> Dan Smith <dms at danplanet.com> wrote:
>>>
>>>>> onSharedStorage = True
>>>>> on_shared_storage = False
>>>>
>>>> This is a good example. I'm not sure it's worth breaking users _or_
>>>> introducing a new microversion for something like this. This is
>>>> definitely what I would call a "purity" concern as opposed to
>>>> "usability".
>>
>> I thought micro versioning was so we could make backwards compatible
>> changes. If we make breaking changes we need to support the old and
>> the new for a little while.
>
> Isn't the period that we have to support the old and the new for these
> sorts of breaking the changes exactly the same period of time that we'd
> have to keep V2 around if we released V3? Either way we're forcing
> people off the old behaviour.
>
>> I am tempted to say the breaking changes just create a new extension,
>> but there are other ways...
>
> Oh, please no :-) Essentially that is no different to creating a new
> extension in the v3 namespace except it makes the v2 namespace even
> more confusing?
>
>> For return values:
>> * get new clients to send Accepts headers, to version the response
>> * this amounts to the "major" version
>> * for those request the new format, they get the new format
>> * for those getting the old format, they get the old format
>>
>> For this case, on requests:
>> * we can accept both formats, or maybe that also depends on the
>> Accepts headers (with is a bit funky, granted).
>> * only document the new one
>> * maybe in two years remove the old format? maybe never?
>>
>
> So the idea of accept headers seems to me like just an alternative to
> using a different namespace except a new namespace is much cleaner.
>
>> Same for URLs, we could have the old a new names, with the new URL
>> always returning the new format (think instace_actions ->
>> server_actions).
>>
>> If the code only differers in presentation, that implies much less
>> double testing that two full versions of the API. It seems like we
>> could make some of these clean ups in, and keep the old version, with
>> relatively few changes.
>
> As I've said before the API layer is very thin. Essentially most of it
> is just about parsing the input, calling something, then formatting the
> output. But we still do double testing even though the difference
> between them most of the time is just "presentation".  Theoretically if
> the unittests were good enough in terms of checking the API we'd only
> have to tempest test a single API but I think experience has shown that
> we're not that good at doing exhaustive unittests. So we use the
> fallback of throwing tempest at both APIs

Not even Tempest in a lot of cases, like the host_maintenance_mode virt 
driver APIs that are only implemented in VMware and XenAPI virt drivers, 
we have no Tempest coverage there because the libvirt driver doesn't 
implement that API.

>
>> We could port the V2 classes over to the V3 code, to get the code
>> benefits.
>
> I'm not exactly sure what you mean here. If you mean backporting say
> the V3 infrastructure so V2 can use it, I don't want people
> underestimating the difficulty of that. When we developed the new
> architecture we had the benefit of being able to bootstrap it without
> it having to work for a while. Eg. getting core bits like servers and
> images up and running without having to have the additional parts which
> depend on it working with it yet. With V2 we can't do that, so
> operating on a "active" system is going to be more difficult. The CD
> people will not be happy with breakage :-)
>
> But even then it took a considerable amount of effort - both coding and
> review to get the changes merged, and that was back in Havana when it
> was easier to review bandwidth. And we also discovered that especially
> with that sort of infrastructure work its very difficult to get many
> people working parallel - or even one person working on too many things
> at one time. Because you end up in merge confict/rebase hell. I've been
> there a lot in Havana and Icehouse.

+1 to not backporting all of the internal improvements from V3 to V2. 
That'd be a ton of duplicate code and review time and if we aren't 
making backwards incompatible changes to V2 I don't see the point, we're 
just kicking the can down the road on when we do need to make backwards 
incompatible changes and require a new API major version bump for 
<insert shiny new thing here>.

>
>> Return codes are a bit harder, it seems odd to change those based on
>> Accepts headers, but maybe I could live with that.
>>
>>
>> Maybe this is the code mess we were trying to avoid, but I feel we
>> should at least see how bad this kind of approach would look?
>
> So to me this approach really doesn't look a whole lot different to
> just having a separate v2/v3 codebase in terms of maintenance. LOC
> would be lower, but testing load is similar if we make the same sorts
> of changes. Some things like input validation are a bit harder to
> implement (because you need quite lax input validation for v2-old and
> strict for v2-new).
>
> Also how long are we going to spend on this sort of exploratory work?
> The longer we take on it, the more we risk V3 slipping in Juno if we
> take that route.

+1, seems would could explore for another cycle just to find out that 
backporting everything to V2 isn't going to be what we want, and now 
we've just wasted more time.

>
> If we really need a super long deprecation period for V2 I'm going to
> suggest again the idea of V2 proxy which translates to V3 speak and does
> the necessary proxying. From a testing point of view we'd only need to
> test that input and output of the proxy (ie correct V3 code requests are
> emitted and correct V2 output is returned). And we already have tempest
> tests for V2 which we could use for more general correctness (at least
> a first for sanity checking). Its not ideal and there's probably some
> compromise we'd have to make on the V3 input validation around names of
> things, but otherwise should work. And it does allow us to pull the V2
> code out of the tree earlier (say after just 2 cycles after V3 is
> released which gives us enough ramp up time to get the proxy working).
>
>> I agree its a mess.
>>
>> But rather than fork the code, can we find a better way of supporting
>> the old an new versions on a single (ideally cleaner) code base?
>
> So I guess I keep coming back to repeating that the API layer is really
> thin. Its main purpose being to just parse the incoming data and format
> the outgoing. In most extensions there is actually very little actual
> logic inside it - its an abstraction layer which allows us to fiddle
> with the internals without exposing them to the API. So the gain you
> get from having to support two version of the API are small, if not
> negative because the code itself is more complex (and you risk
> accidental interaction between parsing for v2 and v3).

Agree here - I was trying to tie this back into the discussion on what 
the actual maintenance costs are because it seems that most of the time 
when we have a V2 "api" bug reported, it's really a bug in the compute 
manager, not an actual problem in the V2 API layer itself.

There are times where we've cleaned up the V2 API to provide more 
appropriate error codes or handle more specific exceptions from the 
compute manager, but otherwise I don't remember seeing *that* many bugs 
just for the API layer itself.

If we say it's just deprecated and frozen against new features, then 
it's maintenance is just limited to bug fixes right?  How much are those 
going to crop up over a few years?  Consider how many backports we 
actually make to stable/grizzly right now, it's laughable.  The Nova 
unit tests aren't even working in stable/grizzly right now (that's been 
broken a few weeks) so it goes to show how much effort we're actually 
putting into stable release maintenance.

>
> Also in terms of testing I don't think you save a lot - perhaps a
> little bit on unittests - but not much since much of the API unit tests
> is meant to be about testing the api parsing and output rather than
> testing what is underneath so you need to test all the possible code
> paths anyway.
>
> For tempest testing perhaps you could say well most of it is the same,
> we don't need to test both, but that's pretty much true for v2 and v3
> as it is anyway as fundamentally both apis still call the underlying
> code the same way. Tempest is a sanity check we'd still want in both
> cases regardless.
>
>> So users can migrate in their own timeframe, and we don't get a
>> complete maintenance nightmare in the process.
>
> So I'd like to tease out a bit more what the maintenance concerns are.
> As I seem them the overheads are:
>
> - Tempest tests. Worst case we double the amount of testing the Nova
>    requires (in practice its less than this because the v3 API is quite
>    a bit smaller than the v2 API since we can drop the cruft we don't
>    want to support in the future).
>
>    Personally I think this is the worst part. There's also the
>    duplicated tests, though I think if we really tried we could probably
>    share more test code between the two. I didn't think it was worth it
>    if we're only keeping the v2 API for a 2-3 cycles after v3 release
>    (and being resource constrained getting some sanity checking for the
>    V3 API was more important), but if we're doing it for longer then it
>    may well be. The recent decision to remove XML will also make this
>    much easier.
>
> - Internal changes needing corresponding changes to v2, v3 and ec2
>    apis. Doing objects and the v3 API at the same time definitely hurt.
>    We merged conflicted a lot. But with objects in place I think this
>    overhead is actually now quite small. We have oh so many layers of
>    abstraction now that the vast majority of changes to Nova won't need
>    to fiddle with the API. And although when changes need to be made,
>    its normally just a trivial change required, albeit needing to be
>    done in 2-3 places instead of 1-2.
>
> - Unit tests. This is non trivial, but there's likely to be a lot of
>    code duplication we can collapse (even within the v2 and v3 APIs
>    unittests there's a lot of duplicated code that could be removed and
>    I suspect if we tried we could share it between v2 and v3). There'd
>    be a bunch of refactoring work required mostly around tests being
>    able to more generically take input and more generically test output.
>    So its not easy, but we could cut down on the overhead there.
>
> So I think this is all hard to quantify but I don't think its as big as
> people fear - I think the tempest overhead is the main concern because
> it maps to extra check/gate resources but if we want backwards
> incompatible changes we get that regardless. I really don't see the
> in-tree nova overhead as that significant - some of it comes down just
> to reviewers asking if a change is made to v2 does it need to be done to
> ec2/v3 as well?

+1

>
> So I think we come back to our choices:
>
> - V2 stays forever. Never any backwards incompatible changes. For lots
>    of reasons I've mentioned before don't like it. Also the longer we
>    delay the harder it gets.
>
> - V2 with V3 backport incorporating changes into V2. Probably less
>    tempest testing load depending on how much is backported. But a *lot*
>    of backporting working. It took us 2 cycles to get V3 to this stage,
>    it'd be 3 in the end if we the release V3 in Juno. How many cycles
>    would it take us to implement V3 changes in the V2 code? And in many
>    cases its not a matter of just backporting patches, its starting from
>    scratch. And we don't have a clear idea of when we can deprecate the
>    V2 part of the code (the cleanup of which will be harder than just
>    removing everything in the contrib directory ;-)
>
> - Release V3. But we don't know how long we have to maintain V2 for.
>    But if its just two years after the V3 release I think its a
>    no-brainer that we just go the V3 route. If its 7 or 10 years then I
>    think we'll probably find it hard to justify any backwards
>    incompatible change and that will make me very sad given the state of
>    the V2 API. (And as an aside if we suspect that "never deprecate" is
>    the answer I think we should defer all the pending new API extensions
>    in the queue for V2 - because we haven't done a sufficient evaluation
>    of them and we'll have to live with what they do forever)
>
> Whatever we decide I think its clear we need to be much much more
> stricter about what new APIs we allow in and any really changes at all
> to the API. Because we're stuck with the consequences for a very long
> time. There's a totally different trade off between speed of
> development and long term consequences if you make a mistake compared
> to the rest of Nova.
>
> Chris
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

-- 

Thanks,

Matt Riedemann

Open Stack

[openstack-dev] [nova] Future of the Nova API

OpenStack

Community

Documentation

Branding & Legal