[openstack-dev] [tc] [all] TC Report 18-26

Zane Bitter zbitter at redhat.com
Wed Jul 18 01:12:30 UTC 2018


On 17/07/18 10:44, Thierry Carrez wrote:
> Finally found the time to properly read this...

For anybody else who found the wall of text challenging, I distilled the 
longest part into a blog post:

https://www.zerobanana.com/archive/2018/07/17#openstack-layer-model-limitations

> Zane Bitter wrote:
>> [...]
>> We chose to add features to Nova to compete with vCenter/oVirt, and 
>> not to add features the would have enabled OpenStack as a whole to 
>> compete with more than just the compute provisioning subset of 
>> EC2/Azure/GCP.
> 
> Could you give an example of an EC2 action that would be beyond the 
> "compute provisioning subset" that you think we should have built into 
> Nova ?

Automatic provision/rotation of application credentials.
Reliable, user-facing event notifications.
Collection of usage data suitable for autoscaling, billing, and whatever 
it is that Watcher does.

>> Meanwhile, the other projects in OpenStack were working on building 
>> the other parts of an AWS/Azure/GCP competitor. And our vague 
>> one-sentence mission statement allowed us all to maintain the delusion 
>> that we were all working on the same thing and pulling in the same 
>> direction, when in truth we haven't been at all.
> 
> Do you think that organizing (tying) our APIs along [micro]services, 
> rather than building a sanely-organized user API on top of a 
> sanely-organized set of microservices, played a role in that divide ?

TBH, not really. If I were making a list of contributing factors I would 
probably put 'path dependence' at #1, #2 and #3.

At the start of this discussion, Jay posted on IRC a list of things that 
he thought shouldn't have been in the Nova API[1]:

- flavors
- shelve/unshelve
- instance groups
- boot from volume where nova creates the volume during boot
- create me a network on boot
- num_instances > 1 when launching
- evacuate
- host-evacuate-live
- resize where the user 'confirms' the operation
- force/ignore host
- security groups in the compute API
- force delete server
- restore soft deleted server
- lock server
- create backup

Some of those are trivially composable in higher-level services (e.g. 
boot from volume where nova creates the volume, get me a network, 
security groups). I agree with Jay that in retrospect it would have been 
cleaner to delegate those to some higher level than the Nova API (or, 
equivalently, for some lower-level API to exist within what is now 
Nova). And maybe if we'd had a top-level API like that we'd have been 
more aware of the ways that the lower-level ones lacked legibility for 
orchestration tools (oaktree is effectively an example of a top-level 
API like this, I'm sure Monty can give us a list of complaints ;)

But others on the list involve operations at a low level that don't 
appear to me to be composable out of simpler operations. (Maybe Jay has 
a shorter list of low-level APIs that could be combined to implement all 
of these, I don't know.) Once we decided to add those features, it was 
inevitable that they would reach right the way down through the stack to 
the lowest level.

There's nothing _organisational_ stopping Nova from creating an internal 
API (it need not even be a ReST API) for the 'plumbing' parts, with a 
separate layer that does orchestration-y stuff. That they're not doing 
so suggests to me that they don't think this is the silver bullet for 
managing complexity.

What would have been a silver bullet is saying 'no' to a bunch of those 
features, preferably starting with 'restore soft deleted server'(!!) and 
shelve/unshelve(?!). When AWS got feature requests like that they didn't 
say 'we'll have to add that in a higher-level API', they said 'if your 
application needs that then cloud is not for you'. We were never 
prepared to say that.

[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2018-06-26.log.html#t2018-06-26T15:30:33

>> We can decide that we want to be one, or the other, or both. But if we 
>> don't all decide together then a lot of us are going to continue 
>> wasting our time working at cross-purposes.
> 
> If you are saying that we should choose between being vCenter or AWS, I 
> would definitely say the latter.

Agreed.

> But I'm still not sure I see this issue 
> in such a binary manner.

I don't know that it's still a viable option to say 'AWS' now. Given our 
installed base of users and our commitment to not breaking them, our 
practical choices may well be between 'vCenter' or 'both'.

It's painful because had we chosen 'AWS' at the beginning then we could 
have avoided the complexity hit of many of those features listed above, 
and spent our complexity budget on cloud features instead. Now we are 
locked in to supporting that legacy complexity forever, and it has 
reportedly maxed out our complexity budget to the point where people are 
reluctant to implement any cloud features, and unable to refactor to 
make them easier.

Astute observers will note that this is a *textbook* case of the 
Innovator's Dilemma.

> Imagine if (as suggested above) we refactored the compute node and give 
> it a user API, would that be one, the other, both ?

In itself, it would have no effect. But if the refactor made the code 
easier to maintain, it might increase the willingness to move from one 
to both.

> Or just a sane 
> addition to improve what OpenStack really is today: a set of open 
> infrastructure components providing different services with each their 
> API, with slight gaps and overlaps between them ?

If nothing else, it would make it possible for somebody (probably Jay ;) 
to write a simpler compute API without any legacy cruft. Then at least 
when the Nova API's lunch gets eaten it might be by something in 
OpenStack rather than something like kubevirt.

> Personally, I'm not very interested in discussing what OpenStack could 
> have been if we started building it today. I'm much more interested in 
> discussing what to add or change in order to make it usable for more use 
> cases while continuing to serve the needs of our existing users.

It feels strange to argue against this, because it's the exact same 
philosophy of bottom-up incremental change that I've pushed for many, 
many years.

However, I'm increasingly of the opinion that in some circumstances - 
particularly when some of your fundamental assumptions have changed, or 
you realise you had the wrong model of the problem - it's more helpful 
to step back and imagine how things would look if you were designing 
from scratch. And only _then_ look for incremental ways to get closer to 
that design. Skipping that step tends to lead to either (a) patchwork 
solutions that lack conceptual integrity, or (b) giving up and sticking 
with what you have. And often both, now that I think about it.

> And I'm 
> not convinced that's an either/or choice...

I said specifically that it's an either/or/and choice.

So it's not a binary choice but it's very much a ternary choice IMHO. 
The middle ground, where each project - or even each individual 
contributor within a project - picks an option independently and 
proceeds on the implicit assumption that everyone else chose the same 
option (although - spoiler alert - they didn't)... that's not a good 
place to be.

cheers,
Zane.



More information about the OpenStack-dev mailing list