Open Stack

Tue Nov 28 21:20:10 UTC 2017

Hey everybody!

https://etherpad.openstack.org/p/sydney-forum-multi-cloud-management

I've CC'd everyone who listed interest directly, just in case you're not 
already on the openstack-dev list. If you aren't, and you are in fact 
interested in this topic, please subscribe and make sure to watch for 
[oaktree] subject headings.

We had a great session in Sydney about the needs of managing resources 
across multiple clouds. During the session I pointed out the work that 
had been started in the Oaktree project [0][1] and offered that if the 
people who were interested in the topic thought we'd make progress best 
by basing the work on oaktree, that we should bootstrap a new core team 
and kick off some weekly meetings. This is, therefore, the kickoff email 
to get that off the ground.

All of the below is thoughts from me and a description of where we're at 
right now. It should all be considered up for debate, except for two things:

- gRPC API
- backend implementation based on shade

As those are the two defining characteristics of the project. For those 
who weren't in the room, justifications for those two characteristics are:

gRPC API
--------

There are several reasons why gRPC.

* Make it clear this is not a competing REST API.

OpenStack has a REST API already. This is more like a 'federation' API 
that knows how to talk to one or more clouds (similar to the kubernetes 
federation API)

* Streaming and async built in

One of the most costly things in using the OpenStack API is polling. 
gRPC is based on HTTP/2 and thus supports streaming and other exciting 
things. This means an oaktree running in or on a cloud can do its 
polling loops over the local network and the client can just either wait 
on a streaming call until the resource is ready, or can fire an async 
call and deal with it later on a notification channel.

* Network efficiency

Protobuf over HTTP/2 is a super-streamlined binary protocol, which 
should actually be really  nice for our friends in Telco land who are 
using OpenStack for Edge-related tasks in 1000s of sites. All those 
roundtrips add up at scale.

* Multi-language out of the box

gRPC allows us to directly generate consistent consumption libs for a 
bunch of languages - or people can grab the proto files and integrate 
those into their own build if they prefer.

* The cool kids are doing it

To be fair, Jay Pipes and I tried to push OpenStack to use Protobuf 
instead of JSON for service-to-service communication back in 2010 - so 
it's not ACTUALLY a new idea... but with Google pushing it and support 
from the CNCF, gRPC is actually catching on broadly. If we're writing a 
new thing, let's lean forward into it.

Backend implementation in shade
-------------------------------

If the service is defined by gRPC protos, why not implement the service 
itself in Go or C++?

* Business logic to deal with cloud differences

Adding a federation API isn't going to magically make all of those 
clouds work the same. We've got that fairly well sorted out in shade and 
would need to reimplement basically all of shade in other other language.

* shade is battle tested at scale

shade is what Infra's nodepool uses. In terms of high-scale API 
consumption, we've learned a TON of lessons. Much of the design inside 
of shade is the result of real-world scaling issues. It's Open Source, 
so we could obviously copy all of that elsewhere - but why? It exists 
and it works, and oaktree itself should be a scale-out shared-nothing 
kind of service anyway.

The hard bits here aren't making API calls to 3 different clouds, the 
hard bits are doing that against 3 *different* clouds and presenting the 
results sanely and consistently to the original user.

Proposed Structure
==================

PTL
---

As the originator of the project, I'll take on the initial PTL role. 
When the next PTL elections roll around, we should do a real election.

Initial Core Team
-----------------

oaktree is still small enough that I don't think we need to be super 
protective - so I think if you're interested in working on it and you 
think you'll have the bandwidth to pay attention, let me know and I'll 
add you to the team.

General rules of thumb I try to follow on top of normal OpenStack 
reviewing guidelines:

* Review should mostly be about suitability of design/approach. Style 
issues should be handled by pep8/hacking (with one exception, see 
below). Functional issues should be handled with tests. Let the machines 
be machines and humans be humans.

* Use followup patches to fix minor things rather than causing an 
existing patch to get re-spun and need to be re-reviewed.

The one style exception ... I'm a big believer in not using visual 
indentation - but I can't seem to get pep8 or hacking to complain about 
its use. This isn't just about style - visual indentation causes more 
lines to be touched during a refactor than are necessary making the 
impact of a change harder to see.

good:

   x = some_function(
       with_some, arguments)

bad:

   x = some_function(with_some,
                     arguments)

If anyone can figure out how to write a hacking rule that enforces that 
I'll buy you a herd of chickens.

Weekly Meeting
--------------

Let's give it a week or so to see who is interested in being in the 
initial core team so that we can figure out what timezones folks are in 
and pick a time that works for the maximum number of people.

IRC Channel
-----------

oaktree development is closely related to shade development which is now 
in #openstack-sdks, so let's stay there until we get kicked out.

Bugs/Blueprints
---------------

oaktree uses storyboard [2][3]

   https://storyboard.openstack.org/#!/project/855

oaktree tech overview
=====================

oaktree is a service that presents a gRPC API and that uses shade as its 
OpenStack connectivity layer. It's organized into two repos at the 
moment, oaktreemodel and oaktree. The intent is that anyone should be 
able to use the gRPC/protobuf definitions to create a client 
implementation. It is explicitly not the intent that there be more than 
one server implementation, since that would require reimplementing all 
of the hairy business logic that's in shade already.

oaktreemodel contains the protobuf definitions, as well as the generated 
golang code. It is intended to provide a library that is easily pip 
installable by anyone who wants to build a client without them needing 
to add protoc steps. It contains build tooling to produce python, golang 
and C++. The C++ and python files are generated and included in the 
source sdist artifacts. Since golang uses git for consumption, the 
generated golang files are committed to the oaktreemodel repo.

oaktreemodel has a more complex build toolchain, but is organized so 
that only oaktreemodel devs need to deal with it. People consuming 
oaktreemodel should not need to know anything about how it's built - pip 
install oaktreemodel or go get 
https://git.openstack.org/openstack/oaktreemodel should Just Work with 
no additional effort on the part of the programmer.

oaktree contains the server implementation and depends on oaktreemodel. 
It's mostly a thin shim layer mapping gRPC stubs to shade calls. Much of 
the logic that needs to exist for oaktree to work wants to live in 
shade, but I'm sure we'll find places where that's not true.

Ultra Short-Term TODO/in-progress
=================================

Fix Gate jobs for Zuul v3 (mordred)
-----------------------------------

We have devstack functional gate jobs - but they haven't been updated 
since the Zuul v3 migration. Duong Ha-Quang submitted a patch [4] to 
migrate the legacy jobs to in-tree. We need to get that fixed up, then 
migrate the job to use the new fancy devstack base job.

I'll get this all fixed ASAP so that it's easy for folks to start 
hacking on patches.

I'm working on this one.

A patch for oaktreemodel is in flight [5]. We still needs a patch to 
oaktree to follow up, which I have half-finished. I'll get it up once 
the oaktreemodel patch is green.

Short-Term TODOs
================

Expose more things
------------------

shade has *way* more capabilities than oaktree, which is mostly a matter 
of writing some proto definitions for resources that match the 'strict' 
version of shade's data model. In some cases it might mean that we need 
to define a data model contract in shade too... but by and large picking 
things and adding them is a great way to get familiar with all the 
pieces and how things flow together.

We should also consider whether or not we can do any meta-programming to 
map shade calls into oaktree calls automatically. For now I think we 
should be fine with just having copy-pasta boilerplate until we 
understand enough about the patterns to abstract them - but we SHOULD be 
able to do some work to reduce the boilerplate.

Write better tests
------------------

There are gate jobs at the moment and a tiny smoke-test script. We 
should add some functional tests for python, go and C++ in the 
oaktreemodel repo.

I'm not sure a TON of unittests in oaktreemodel will be super useful - 
however, some simple tests that verify we haven't borked something in 
the protos that cause code to be generated improperly would be great. We 
can do those just making sure we can create the proto objects and 
whatnot without needing an actual server running.

Unittests in oaktree itself are likely to have very little value. We can 
always add more requests-mock unittests to shade/python-openstacksdk. I 
think we should focus more on functional tests and on making sure those 
tests can run against not just devstack.

Shift calling interface from shade to python-openstacksdk
---------------------------------------------------------

oaktree doesn't need historical compat, so we can go ahead and start 
using python-openstacksdk. Our tests will be cross-testing with master 
branch commits rather than releases right now anyway.

Add Java and Ruby build plumbing to oaktree model
-------------------------------------------------

Protobuf/gRPC has support for java and ruby as well, we should plumb 
them through as well.

Parallel Multicloud APIs
------------------------

The existing APIs allow for multi-cloud consumption from the same 
connection via a Location object used as a parameter to calls. 
Additionally, shade adds a Location property to every object returned, 
so all shade objects carry the information needed to verify uniqueness.

However, when considering actions like:

   "I want a list of all of my servers on all of my clouds"

the answer is currently an end-user for-loop. We should add calls to 
shade for each of the list/search/get API calls that fetch from all of 
the available cloud regions in parallel and then combine the results 
into a single result list.

We should also think about a design for multi-cloud creates and which 
calls they make sense for. Things like image and flavor immediately come 
to mind, as having consistent image and flavors across cloud regions is 
important.

Both of those are desired features at the shade layer, so designing and 
implementing them will work great there ... but working on adding them 
to shade and exposing them in oaktree at the same time will help inform 
what shape of API at the shade layer serves the oaktree layer the best.

Add REST escape hatch
---------------------

There are PLENTY of things that will never get added to oaktree 
directly- especially things that are deployment/vendor-backend specific. 
One of the things discussed in Sydney was adding an API call to oaktree 
that would return a Protobuf that contains the root URL for a given 
service along with either a token, a list of HTTP Headers to be used for 
auth or both. So something like:

   conn = oaktreemodel.Connect()
   rest_info = conn.get_rest_info(
     location=Location(cloud='example.com', service_type='compute'))
   servers = requests.get(
     rest_info.url + '/servers',
     headers=rest_info.headers).json()

or, maybe that's the gRPC call and there is a call in each language's 
client lib that returns a properly constructed rest client...

   conn = oaktreemodel.Connect()
   compute = conn.get_adapter(
     location=Location(cloud='example.com', service_type='compute'))
   servers = compute.get('/servers').json()

*waves hands* - needs to be thought about, designed and implemented.

Medium Term TODOs
=================

Authentication
--------------

oaktree is currently not authenticated. It works great on a laptop or in 
a location that's locked down through some other means, which should be 
fine for the first steps of the telco/edge use case, as well as for the 
developer use case getting started with it - but it's obviously not 
suitable for a multi-user service. The thinking thusfar has been to NOT 
use keystone for auth, since that introduces the need for having a gRPC 
auth plugin for clients, as well as doing some sort of REST/gRPC dance.

BUT - whether that's the right choice and what the right choice actually 
is is an open question on purpose - getting input from the operators on 
what mechanism works best is important. Maybe making a keystone gRPC 
auth driver and using keystone is the right choice. Maybe it isn't. 
Let's talk about it.

Authorization
-------------

Since it's currently only a single-user service, it operates off of a 
pre-existing local clouds.yaml to define which clouds it has access to. 
Long-term one can imagine that one would want to authorize an oaktree to 
talk to a particular cloud-region in some manner. This needs to be designed.

Multi-user Caching
------------------

oaktree currently uses the caching support in shade for its caching. 
Although it is based on dogpile.cache which means it has support for 
shared backends like redis or memcached, it hasn't really been vetted 
for multi-user sharing a single cache. It'll be fine for the next 6-9 
months, but once we go multi-user I'd be concerned about it - so we 
should consider the caching layer design.

shade oaktreemodel backend
--------------------------

In an ultimate fit of snake eating its own tail, we should add support 
to shade for making client connections to an oaktree if one exists. This 
should obviously be pretty direct passthrough. That would mean that an 
oaktree talking to another oaktree would be able to do so via the gRPC 
layer without any intermediate protobuf-to-dict translation steps.

That leads us to potentially just using the oaktreemodel protobuf 
objects as the basis for the in-memory resource objects inside of 
sdk/shade - but that's inception-y enough that we should just skip it 
for now. If protobuf->json translations are what's killing us, that's a 
great problem to have.

Timetable
=========

I think we should aim for having something that's usable/discussable for 
the single/trusted-user use case for real work (install an oaktree 
yourself pointed at a clouds.yaml file and talk to it locally without 
auth) by the Dublin PTG. It doesn't have to do everything, but we should 
at least have a sense of whether this will solve the needs of the people 
who were interested in this topic so that we'll know whether figuring 
out the auth story is worth-while or if this is all a terrible idea.

I think it's TOTALLY reasonable that by Vancouver we should have a thing 
that's legit usable for folks who have the pain point today (given the 
auth constraint)

If that works out, discuss auth in Vancouver and aim to have it figured 
out and implemented by Berlin so that we can actually start pushing 
clouds to include oaktree in their deployments.

Conclusion
==========

Ok. That's the braindump from me. Let me know if you wanna dive in, 
we'll get a core team fleshed out and an IRC meeting set up and folks 
can start cranking.

Thanks!
Monty

[0] http://git.openstack.org/cgit/openstack/oaktree
[1] http://git.openstack.org/cgit/openstack/oaktreemodel
[2] https://storyboard.openstack.org/#!/project/855
[3] https://storyboard.openstack.org/#!/project/856
[4] https://review.openstack.org/#/c/512561/
[5] https://review.openstack.org/#/c/492531/

Open Stack

[openstack-dev] [oaktree] Follow up to Multi-cloud Management in OpenStack Summit session

OpenStack

Community

Documentation

Branding & Legal