[openstack-dev] [oaktree] Follow up to Multi-cloud Management in OpenStack Summit session
Monty Taylor
mordred at inaugust.com
Tue Nov 28 21:20:10 UTC 2017
Hey everybody!
https://etherpad.openstack.org/p/sydney-forum-multi-cloud-management
I've CC'd everyone who listed interest directly, just in case you're not
already on the openstack-dev list. If you aren't, and you are in fact
interested in this topic, please subscribe and make sure to watch for
[oaktree] subject headings.
We had a great session in Sydney about the needs of managing resources
across multiple clouds. During the session I pointed out the work that
had been started in the Oaktree project [0][1] and offered that if the
people who were interested in the topic thought we'd make progress best
by basing the work on oaktree, that we should bootstrap a new core team
and kick off some weekly meetings. This is, therefore, the kickoff email
to get that off the ground.
All of the below is thoughts from me and a description of where we're at
right now. It should all be considered up for debate, except for two things:
- gRPC API
- backend implementation based on shade
As those are the two defining characteristics of the project. For those
who weren't in the room, justifications for those two characteristics are:
gRPC API
--------
There are several reasons why gRPC.
* Make it clear this is not a competing REST API.
OpenStack has a REST API already. This is more like a 'federation' API
that knows how to talk to one or more clouds (similar to the kubernetes
federation API)
* Streaming and async built in
One of the most costly things in using the OpenStack API is polling.
gRPC is based on HTTP/2 and thus supports streaming and other exciting
things. This means an oaktree running in or on a cloud can do its
polling loops over the local network and the client can just either wait
on a streaming call until the resource is ready, or can fire an async
call and deal with it later on a notification channel.
* Network efficiency
Protobuf over HTTP/2 is a super-streamlined binary protocol, which
should actually be really nice for our friends in Telco land who are
using OpenStack for Edge-related tasks in 1000s of sites. All those
roundtrips add up at scale.
* Multi-language out of the box
gRPC allows us to directly generate consistent consumption libs for a
bunch of languages - or people can grab the proto files and integrate
those into their own build if they prefer.
* The cool kids are doing it
To be fair, Jay Pipes and I tried to push OpenStack to use Protobuf
instead of JSON for service-to-service communication back in 2010 - so
it's not ACTUALLY a new idea... but with Google pushing it and support
from the CNCF, gRPC is actually catching on broadly. If we're writing a
new thing, let's lean forward into it.
Backend implementation in shade
-------------------------------
If the service is defined by gRPC protos, why not implement the service
itself in Go or C++?
* Business logic to deal with cloud differences
Adding a federation API isn't going to magically make all of those
clouds work the same. We've got that fairly well sorted out in shade and
would need to reimplement basically all of shade in other other language.
* shade is battle tested at scale
shade is what Infra's nodepool uses. In terms of high-scale API
consumption, we've learned a TON of lessons. Much of the design inside
of shade is the result of real-world scaling issues. It's Open Source,
so we could obviously copy all of that elsewhere - but why? It exists
and it works, and oaktree itself should be a scale-out shared-nothing
kind of service anyway.
The hard bits here aren't making API calls to 3 different clouds, the
hard bits are doing that against 3 *different* clouds and presenting the
results sanely and consistently to the original user.
Proposed Structure
==================
PTL
---
As the originator of the project, I'll take on the initial PTL role.
When the next PTL elections roll around, we should do a real election.
Initial Core Team
-----------------
oaktree is still small enough that I don't think we need to be super
protective - so I think if you're interested in working on it and you
think you'll have the bandwidth to pay attention, let me know and I'll
add you to the team.
General rules of thumb I try to follow on top of normal OpenStack
reviewing guidelines:
* Review should mostly be about suitability of design/approach. Style
issues should be handled by pep8/hacking (with one exception, see
below). Functional issues should be handled with tests. Let the machines
be machines and humans be humans.
* Use followup patches to fix minor things rather than causing an
existing patch to get re-spun and need to be re-reviewed.
The one style exception ... I'm a big believer in not using visual
indentation - but I can't seem to get pep8 or hacking to complain about
its use. This isn't just about style - visual indentation causes more
lines to be touched during a refactor than are necessary making the
impact of a change harder to see.
good:
x = some_function(
with_some, arguments)
bad:
x = some_function(with_some,
arguments)
If anyone can figure out how to write a hacking rule that enforces that
I'll buy you a herd of chickens.
Weekly Meeting
--------------
Let's give it a week or so to see who is interested in being in the
initial core team so that we can figure out what timezones folks are in
and pick a time that works for the maximum number of people.
IRC Channel
-----------
oaktree development is closely related to shade development which is now
in #openstack-sdks, so let's stay there until we get kicked out.
Bugs/Blueprints
---------------
oaktree uses storyboard [2][3]
https://storyboard.openstack.org/#!/project/855
oaktree tech overview
=====================
oaktree is a service that presents a gRPC API and that uses shade as its
OpenStack connectivity layer. It's organized into two repos at the
moment, oaktreemodel and oaktree. The intent is that anyone should be
able to use the gRPC/protobuf definitions to create a client
implementation. It is explicitly not the intent that there be more than
one server implementation, since that would require reimplementing all
of the hairy business logic that's in shade already.
oaktreemodel contains the protobuf definitions, as well as the generated
golang code. It is intended to provide a library that is easily pip
installable by anyone who wants to build a client without them needing
to add protoc steps. It contains build tooling to produce python, golang
and C++. The C++ and python files are generated and included in the
source sdist artifacts. Since golang uses git for consumption, the
generated golang files are committed to the oaktreemodel repo.
oaktreemodel has a more complex build toolchain, but is organized so
that only oaktreemodel devs need to deal with it. People consuming
oaktreemodel should not need to know anything about how it's built - pip
install oaktreemodel or go get
https://git.openstack.org/openstack/oaktreemodel should Just Work with
no additional effort on the part of the programmer.
oaktree contains the server implementation and depends on oaktreemodel.
It's mostly a thin shim layer mapping gRPC stubs to shade calls. Much of
the logic that needs to exist for oaktree to work wants to live in
shade, but I'm sure we'll find places where that's not true.
Ultra Short-Term TODO/in-progress
=================================
Fix Gate jobs for Zuul v3 (mordred)
-----------------------------------
We have devstack functional gate jobs - but they haven't been updated
since the Zuul v3 migration. Duong Ha-Quang submitted a patch [4] to
migrate the legacy jobs to in-tree. We need to get that fixed up, then
migrate the job to use the new fancy devstack base job.
I'll get this all fixed ASAP so that it's easy for folks to start
hacking on patches.
I'm working on this one.
A patch for oaktreemodel is in flight [5]. We still needs a patch to
oaktree to follow up, which I have half-finished. I'll get it up once
the oaktreemodel patch is green.
Short-Term TODOs
================
Expose more things
------------------
shade has *way* more capabilities than oaktree, which is mostly a matter
of writing some proto definitions for resources that match the 'strict'
version of shade's data model. In some cases it might mean that we need
to define a data model contract in shade too... but by and large picking
things and adding them is a great way to get familiar with all the
pieces and how things flow together.
We should also consider whether or not we can do any meta-programming to
map shade calls into oaktree calls automatically. For now I think we
should be fine with just having copy-pasta boilerplate until we
understand enough about the patterns to abstract them - but we SHOULD be
able to do some work to reduce the boilerplate.
Write better tests
------------------
There are gate jobs at the moment and a tiny smoke-test script. We
should add some functional tests for python, go and C++ in the
oaktreemodel repo.
I'm not sure a TON of unittests in oaktreemodel will be super useful -
however, some simple tests that verify we haven't borked something in
the protos that cause code to be generated improperly would be great. We
can do those just making sure we can create the proto objects and
whatnot without needing an actual server running.
Unittests in oaktree itself are likely to have very little value. We can
always add more requests-mock unittests to shade/python-openstacksdk. I
think we should focus more on functional tests and on making sure those
tests can run against not just devstack.
Shift calling interface from shade to python-openstacksdk
---------------------------------------------------------
oaktree doesn't need historical compat, so we can go ahead and start
using python-openstacksdk. Our tests will be cross-testing with master
branch commits rather than releases right now anyway.
Add Java and Ruby build plumbing to oaktree model
-------------------------------------------------
Protobuf/gRPC has support for java and ruby as well, we should plumb
them through as well.
Parallel Multicloud APIs
------------------------
The existing APIs allow for multi-cloud consumption from the same
connection via a Location object used as a parameter to calls.
Additionally, shade adds a Location property to every object returned,
so all shade objects carry the information needed to verify uniqueness.
However, when considering actions like:
"I want a list of all of my servers on all of my clouds"
the answer is currently an end-user for-loop. We should add calls to
shade for each of the list/search/get API calls that fetch from all of
the available cloud regions in parallel and then combine the results
into a single result list.
We should also think about a design for multi-cloud creates and which
calls they make sense for. Things like image and flavor immediately come
to mind, as having consistent image and flavors across cloud regions is
important.
Both of those are desired features at the shade layer, so designing and
implementing them will work great there ... but working on adding them
to shade and exposing them in oaktree at the same time will help inform
what shape of API at the shade layer serves the oaktree layer the best.
Add REST escape hatch
---------------------
There are PLENTY of things that will never get added to oaktree
directly- especially things that are deployment/vendor-backend specific.
One of the things discussed in Sydney was adding an API call to oaktree
that would return a Protobuf that contains the root URL for a given
service along with either a token, a list of HTTP Headers to be used for
auth or both. So something like:
conn = oaktreemodel.Connect()
rest_info = conn.get_rest_info(
location=Location(cloud='example.com', service_type='compute'))
servers = requests.get(
rest_info.url + '/servers',
headers=rest_info.headers).json()
or, maybe that's the gRPC call and there is a call in each language's
client lib that returns a properly constructed rest client...
conn = oaktreemodel.Connect()
compute = conn.get_adapter(
location=Location(cloud='example.com', service_type='compute'))
servers = compute.get('/servers').json()
*waves hands* - needs to be thought about, designed and implemented.
Medium Term TODOs
=================
Authentication
--------------
oaktree is currently not authenticated. It works great on a laptop or in
a location that's locked down through some other means, which should be
fine for the first steps of the telco/edge use case, as well as for the
developer use case getting started with it - but it's obviously not
suitable for a multi-user service. The thinking thusfar has been to NOT
use keystone for auth, since that introduces the need for having a gRPC
auth plugin for clients, as well as doing some sort of REST/gRPC dance.
BUT - whether that's the right choice and what the right choice actually
is is an open question on purpose - getting input from the operators on
what mechanism works best is important. Maybe making a keystone gRPC
auth driver and using keystone is the right choice. Maybe it isn't.
Let's talk about it.
Authorization
-------------
Since it's currently only a single-user service, it operates off of a
pre-existing local clouds.yaml to define which clouds it has access to.
Long-term one can imagine that one would want to authorize an oaktree to
talk to a particular cloud-region in some manner. This needs to be designed.
Multi-user Caching
------------------
oaktree currently uses the caching support in shade for its caching.
Although it is based on dogpile.cache which means it has support for
shared backends like redis or memcached, it hasn't really been vetted
for multi-user sharing a single cache. It'll be fine for the next 6-9
months, but once we go multi-user I'd be concerned about it - so we
should consider the caching layer design.
shade oaktreemodel backend
--------------------------
In an ultimate fit of snake eating its own tail, we should add support
to shade for making client connections to an oaktree if one exists. This
should obviously be pretty direct passthrough. That would mean that an
oaktree talking to another oaktree would be able to do so via the gRPC
layer without any intermediate protobuf-to-dict translation steps.
That leads us to potentially just using the oaktreemodel protobuf
objects as the basis for the in-memory resource objects inside of
sdk/shade - but that's inception-y enough that we should just skip it
for now. If protobuf->json translations are what's killing us, that's a
great problem to have.
Timetable
=========
I think we should aim for having something that's usable/discussable for
the single/trusted-user use case for real work (install an oaktree
yourself pointed at a clouds.yaml file and talk to it locally without
auth) by the Dublin PTG. It doesn't have to do everything, but we should
at least have a sense of whether this will solve the needs of the people
who were interested in this topic so that we'll know whether figuring
out the auth story is worth-while or if this is all a terrible idea.
I think it's TOTALLY reasonable that by Vancouver we should have a thing
that's legit usable for folks who have the pain point today (given the
auth constraint)
If that works out, discuss auth in Vancouver and aim to have it figured
out and implemented by Berlin so that we can actually start pushing
clouds to include oaktree in their deployments.
Conclusion
==========
Ok. That's the braindump from me. Let me know if you wanna dive in,
we'll get a core team fleshed out and an IRC meeting set up and folks
can start cranking.
Thanks!
Monty
[0] http://git.openstack.org/cgit/openstack/oaktree
[1] http://git.openstack.org/cgit/openstack/oaktreemodel
[2] https://storyboard.openstack.org/#!/project/855
[3] https://storyboard.openstack.org/#!/project/856
[4] https://review.openstack.org/#/c/512561/
[5] https://review.openstack.org/#/c/492531/
More information about the OpenStack-dev
mailing list