[openstack-dev] [oaktree] Follow up to Multi-cloud Management in OpenStack Summit session

Monty Taylor mordred at inaugust.com
Wed Nov 29 00:55:18 UTC 2017


On 11/28/2017 06:05 PM, Joshua Harlow wrote:
 > So just curious.
 >
 > I didn't think shade had any federation logic in it; so I assume it will
 > start getting some?

It's possible that we're missing each other on the definition of the 
word 'federation' ... but shade's entire purpose in life is to allow 
sane use of multiple clouds from the same application.

 > Has there been any prelim. design around what the APIs of this would be
 > and how they would work and how they would return data from X other
 > clouds in a uniform manner? (I'd really be interested in how a high
 > level project is going to combine various resources from other clouds in
 > a way that doesn't look like crap).

(tl;dr - yes)

Ah - I grok what you're saying now. Great question!

There are (at least) four sides to this.

* Creating a resource in a specific location (boot a VM in OVH BHS1)
* Fetching resources from a specific location (show me the image in 
vexxhost)

* Creating a resource everywhere (upload an image to all cloud regions)
* Fetching resources from all locations (show me all my VMs)

The first two are fully handled, as you might imagine, although the 
mechanism is slightly different in shade and oaktree (I'll get back to 
that in a sec)

Creating everywhere isn't terribly complex - when I need to do that 
today it's a simple loop:

   for cloud in shade.openstack_clouds():
     cloud.create_image('my-image', filename='my-image.qcow2')

But we can (and should and will) add some syntactic sugar to make that 
easier. Like (*waving hands*)

   all_clouds = shade.everwhere()
   all_clouds.create_image('my-image', filename='my-image.qcow2')

It's actually more complex than that, because Rackspace wants a VHD and 
OVH wants a RAW but can take a qcow2 as well... but this is an email, so 
for now let's assume that we can handle the general 'create everywhere' 
with a smidge of meta programming, some explicit overrides for the 
resources that need extra special things - and probably something like 
concurrent.futures.ThreadPoolExecutor.

The real fun, as you hint at, comes when we want to read from everywhere.

To prep for this (and inspired specifically be this use-case), shade now 
adds a "location" field to every resource it returns. That location 
field contains cloud, region, domain and project information - so that 
in a list of server objects from across 14 regions of 6 clouds all the 
info about who and what they are is right there in the object.

When we shift to the oaktree gRPC interface, we carry over the Location 
concept:

 
http://git.openstack.org/cgit/openstack/oaktreemodel/tree/oaktreemodel/common.proto#n31

which we keep on all of the resources:

 
http://git.openstack.org/cgit/openstack/oaktreemodel/tree/oaktreemodel/image.proto#n49

So listing all the things should work the same way as the above 
list-from-everywhere method.

The difference I mentioned earlier in how shade and oaktree present the 
location interface is that in shade there is a an OpenStackCloud object 
per cloud-region, and as a user you select which cloud you operate on 
via instantiating an OpenStackCloud pointed at the right thing. We need 
to add the AllTheClouds meta object for the shade interface.

In oaktree, there is the one oaktree instance and it contains 
information about all of your cloud-regions, so Locations and Filters 
become a parameters on operations.

 > Will this thing also have its own database (or something like a DB)?

It's an open question. Certainly not at the moment or in the near future 
- there's no need for one, as the constituent OpenStack clouds are the 
actual source of truth, the thing we need is caching rather than data 
that is canonical itself.

This will almost certainly change as we work on the auth story, but the 
specifics of that are ones that need to be sorted out collectively - 
preferably with operators involved.

 > I can imagine if there is a `create_many_servers` call in oaktree that
 > it will need to have some sort of lock taken by the process doing this
 > set of XYZ calls (in the right order) so that some other
 > `create_many_servers` call doesn't come in and screw everything the
 > prior one up... Or maybe cross-cloud consistency issues aren't a
 > concern... What's the thoughts here?
That we have already, actually, and you've even landed code in it. :) 
shade executes all of its remote operations through a TaskManager. The 
default one that you get if you're just running some ansible is a 
pass-through. However, in nodepool we have a multi-threaded 
rate-limiting TaskManager that ensures that we're only ever doing one 
operation at a time for a given cloud-region, and that we're keeping 
ourselves inside of a configurable rate limit (learned the hard-way from 
crashing a few public clouds)

It's worth noting that shade is not transactional (although there are a 
few places where, if shade created a resource on the user's behalf that 
the user doesn't know about, it will delete it on error) So for "create 
many servers" the process for each will succeed or fail. Depending on 
the resource it's either safe or not safe to retry without deleting - 
but that'll be something we'll want to rationalize in an oaktree context.

For things that do need a greater amount of transactional consistency, 
like "I want 4 vms, a private network and a load balancer in each of my 
cloud-regions" ... I believe the shade/oaktree operation would be "run 
this heat template everywhere" Heat already handles convergence 
operations, shade trying to from the outside would be OY.

 >
 > What happens in the above if a third user Y is creating resources in one
 > of those clouds outside the view of oaktree... ya da ya da... What
 > happens if they are both targeting the same tenant...

Yup. That should actually work fine (we do this all the time)- it's why 
we assume the cloud is the source of truth for what exists and not a 
local data store (two phase commits across WAN links anybody?)

 > Perhaps a decent idea to start some kind of etherpad to start listing
 > these questions (and at least think about them a wee bit) down?

Sounds great!

 > Monty Taylor wrote:
 >> Hey everybody!
 >>
 >> https://etherpad.openstack.org/p/sydney-forum-multi-cloud-management
 >>
 >> I've CC'd everyone who listed interest directly, just in case you're not
 >> already on the openstack-dev list. If you aren't, and you are in fact
 >> interested in this topic, please subscribe and make sure to watch for
 >> [oaktree] subject headings.
 >>
 >> We had a great session in Sydney about the needs of managing resources
 >> across multiple clouds. During the session I pointed out the work that
 >> had been started in the Oaktree project [0][1] and offered that if the
 >> people who were interested in the topic thought we'd make progress best
 >> by basing the work on oaktree, that we should bootstrap a new core team
 >> and kick off some weekly meetings. This is, therefore, the kickoff email
 >> to get that off the ground.
 >>
 >> All of the below is thoughts from me and a description of where we're at
 >> right now. It should all be considered up for debate, except for two
 >> things:
 >>
 >> - gRPC API
 >> - backend implementation based on shade
 >>
 >> As those are the two defining characteristics of the project. For those
 >> who weren't in the room, justifications for those two characteristics
 >> are:
 >>
 >> gRPC API
 >> --------
 >>
 >> There are several reasons why gRPC.
 >>
 >> * Make it clear this is not a competing REST API.
 >>
 >> OpenStack has a REST API already. This is more like a 'federation' API
 >> that knows how to talk to one or more clouds (similar to the kubernetes
 >> federation API)
 >>
 >> * Streaming and async built in
 >>
 >> One of the most costly things in using the OpenStack API is polling.
 >> gRPC is based on HTTP/2 and thus supports streaming and other exciting
 >> things. This means an oaktree running in or on a cloud can do its
 >> polling loops over the local network and the client can just either wait
 >> on a streaming call until the resource is ready, or can fire an async
 >> call and deal with it later on a notification channel.
 >>
 >> * Network efficiency
 >>
 >> Protobuf over HTTP/2 is a super-streamlined binary protocol, which
 >> should actually be really nice for our friends in Telco land who are
 >> using OpenStack for Edge-related tasks in 1000s of sites. All those
 >> roundtrips add up at scale.
 >>
 >> * Multi-language out of the box
 >>
 >> gRPC allows us to directly generate consistent consumption libs for a
 >> bunch of languages - or people can grab the proto files and integrate
 >> those into their own build if they prefer.
 >>
 >> * The cool kids are doing it
 >>
 >> To be fair, Jay Pipes and I tried to push OpenStack to use Protobuf
 >> instead of JSON for service-to-service communication back in 2010 - so
 >> it's not ACTUALLY a new idea... but with Google pushing it and support
 >> from the CNCF, gRPC is actually catching on broadly. If we're writing a
 >> new thing, let's lean forward into it.
 >>
 >> Backend implementation in shade
 >> -------------------------------
 >>
 >> If the service is defined by gRPC protos, why not implement the service
 >> itself in Go or C++?
 >>
 >> * Business logic to deal with cloud differences
 >>
 >> Adding a federation API isn't going to magically make all of those
 >> clouds work the same. We've got that fairly well sorted out in shade and
 >> would need to reimplement basically all of shade in other other 
language.
 >>
 >> * shade is battle tested at scale
 >>
 >> shade is what Infra's nodepool uses. In terms of high-scale API
 >> consumption, we've learned a TON of lessons. Much of the design inside
 >> of shade is the result of real-world scaling issues. It's Open Source,
 >> so we could obviously copy all of that elsewhere - but why? It exists
 >> and it works, and oaktree itself should be a scale-out shared-nothing
 >> kind of service anyway.
 >>
 >> The hard bits here aren't making API calls to 3 different clouds, the
 >> hard bits are doing that against 3 *different* clouds and presenting the
 >> results sanely and consistently to the original user.
 >>
 >> Proposed Structure
 >> ==================
 >>
 >> PTL
 >> ---
 >>
 >> As the originator of the project, I'll take on the initial PTL role.
 >> When the next PTL elections roll around, we should do a real election.
 >>
 >> Initial Core Team
 >> -----------------
 >>
 >> oaktree is still small enough that I don't think we need to be super
 >> protective - so I think if you're interested in working on it and you
 >> think you'll have the bandwidth to pay attention, let me know and I'll
 >> add you to the team.
 >>
 >> General rules of thumb I try to follow on top of normal OpenStack
 >> reviewing guidelines:
 >>
 >> * Review should mostly be about suitability of design/approach. Style
 >> issues should be handled by pep8/hacking (with one exception, see
 >> below). Functional issues should be handled with tests. Let the machines
 >> be machines and humans be humans.
 >>
 >> * Use followup patches to fix minor things rather than causing an
 >> existing patch to get re-spun and need to be re-reviewed.
 >>
 >> The one style exception ... I'm a big believer in not using visual
 >> indentation - but I can't seem to get pep8 or hacking to complain about
 >> its use. This isn't just about style - visual indentation causes more
 >> lines to be touched during a refactor than are necessary making the
 >> impact of a change harder to see.
 >>
 >> good:
 >>
 >> x = some_function(
 >> with_some, arguments)
 >>
 >> bad:
 >>
 >> x = some_function(with_some,
 >> arguments)
 >>
 >> If anyone can figure out how to write a hacking rule that enforces that
 >> I'll buy you a herd of chickens.
 >>
 >> Weekly Meeting
 >> --------------
 >>
 >> Let's give it a week or so to see who is interested in being in the
 >> initial core team so that we can figure out what timezones folks are in
 >> and pick a time that works for the maximum number of people.
 >>
 >> IRC Channel
 >> -----------
 >>
 >> oaktree development is closely related to shade development which is now
 >> in #openstack-sdks, so let's stay there until we get kicked out.
 >>
 >> Bugs/Blueprints
 >> ---------------
 >>
 >> oaktree uses storyboard [2][3]
 >>
 >> https://storyboard.openstack.org/#!/project/855
 >>
 >> oaktree tech overview
 >> =====================
 >>
 >> oaktree is a service that presents a gRPC API and that uses shade as its
 >> OpenStack connectivity layer. It's organized into two repos at the
 >> moment, oaktreemodel and oaktree. The intent is that anyone should be
 >> able to use the gRPC/protobuf definitions to create a client
 >> implementation. It is explicitly not the intent that there be more than
 >> one server implementation, since that would require reimplementing all
 >> of the hairy business logic that's in shade already.
 >>
 >> oaktreemodel contains the protobuf definitions, as well as the generated
 >> golang code. It is intended to provide a library that is easily pip
 >> installable by anyone who wants to build a client without them needing
 >> to add protoc steps. It contains build tooling to produce python, golang
 >> and C++. The C++ and python files are generated and included in the
 >> source sdist artifacts. Since golang uses git for consumption, the
 >> generated golang files are committed to the oaktreemodel repo.
 >>
 >> oaktreemodel has a more complex build toolchain, but is organized so
 >> that only oaktreemodel devs need to deal with it. People consuming
 >> oaktreemodel should not need to know anything about how it's built - pip
 >> install oaktreemodel or go get
 >> https://git.openstack.org/openstack/oaktreemodel should Just Work with
 >> no additional effort on the part of the programmer.
 >>
 >> oaktree contains the server implementation and depends on oaktreemodel.
 >> It's mostly a thin shim layer mapping gRPC stubs to shade calls. Much of
 >> the logic that needs to exist for oaktree to work wants to live in
 >> shade, but I'm sure we'll find places where that's not true.
 >>
 >> Ultra Short-Term TODO/in-progress
 >> =================================
 >>
 >> Fix Gate jobs for Zuul v3 (mordred)
 >> -----------------------------------
 >>
 >> We have devstack functional gate jobs - but they haven't been updated
 >> since the Zuul v3 migration. Duong Ha-Quang submitted a patch [4] to
 >> migrate the legacy jobs to in-tree. We need to get that fixed up, then
 >> migrate the job to use the new fancy devstack base job.
 >>
 >> I'll get this all fixed ASAP so that it's easy for folks to start
 >> hacking on patches.
 >>
 >> I'm working on this one.
 >>
 >> A patch for oaktreemodel is in flight [5]. We still needs a patch to
 >> oaktree to follow up, which I have half-finished. I'll get it up once
 >> the oaktreemodel patch is green.
 >>
 >> Short-Term TODOs
 >> ================
 >>
 >> Expose more things
 >> ------------------
 >>
 >> shade has *way* more capabilities than oaktree, which is mostly a matter
 >> of writing some proto definitions for resources that match the 'strict'
 >> version of shade's data model. In some cases it might mean that we need
 >> to define a data model contract in shade too... but by and large picking
 >> things and adding them is a great way to get familiar with all the
 >> pieces and how things flow together.
 >>
 >> We should also consider whether or not we can do any meta-programming to
 >> map shade calls into oaktree calls automatically. For now I think we
 >> should be fine with just having copy-pasta boilerplate until we
 >> understand enough about the patterns to abstract them - but we SHOULD be
 >> able to do some work to reduce the boilerplate.
 >>
 >> Write better tests
 >> ------------------
 >>
 >> There are gate jobs at the moment and a tiny smoke-test script. We
 >> should add some functional tests for python, go and C++ in the
 >> oaktreemodel repo.
 >>
 >> I'm not sure a TON of unittests in oaktreemodel will be super useful -
 >> however, some simple tests that verify we haven't borked something in
 >> the protos that cause code to be generated improperly would be great. We
 >> can do those just making sure we can create the proto objects and
 >> whatnot without needing an actual server running.
 >>
 >> Unittests in oaktree itself are likely to have very little value. We can
 >> always add more requests-mock unittests to shade/python-openstacksdk. I
 >> think we should focus more on functional tests and on making sure those
 >> tests can run against not just devstack.
 >>
 >> Shift calling interface from shade to python-openstacksdk
 >> ---------------------------------------------------------
 >>
 >> oaktree doesn't need historical compat, so we can go ahead and start
 >> using python-openstacksdk. Our tests will be cross-testing with master
 >> branch commits rather than releases right now anyway.
 >>
 >> Add Java and Ruby build plumbing to oaktree model
 >> -------------------------------------------------
 >>
 >> Protobuf/gRPC has support for java and ruby as well, we should plumb
 >> them through as well.
 >>
 >> Parallel Multicloud APIs
 >> ------------------------
 >>
 >> The existing APIs allow for multi-cloud consumption from the same
 >> connection via a Location object used as a parameter to calls.
 >> Additionally, shade adds a Location property to every object returned,
 >> so all shade objects carry the information needed to verify uniqueness.
 >>
 >> However, when considering actions like:
 >>
 >> "I want a list of all of my servers on all of my clouds"
 >>
 >> the answer is currently an end-user for-loop. We should add calls to
 >> shade for each of the list/search/get API calls that fetch from all of
 >> the available cloud regions in parallel and then combine the results
 >> into a single result list.
 >>
 >> We should also think about a design for multi-cloud creates and which
 >> calls they make sense for. Things like image and flavor immediately come
 >> to mind, as having consistent image and flavors across cloud regions is
 >> important.
 >>
 >> Both of those are desired features at the shade layer, so designing and
 >> implementing them will work great there ... but working on adding them
 >> to shade and exposing them in oaktree at the same time will help inform
 >> what shape of API at the shade layer serves the oaktree layer the best.
 >>
 >> Add REST escape hatch
 >> ---------------------
 >>
 >> There are PLENTY of things that will never get added to oaktree
 >> directly- especially things that are deployment/vendor-backend specific.
 >> One of the things discussed in Sydney was adding an API call to oaktree
 >> that would return a Protobuf that contains the root URL for a given
 >> service along with either a token, a list of HTTP Headers to be used for
 >> auth or both. So something like:
 >>
 >> conn = oaktreemodel.Connect()
 >> rest_info = conn.get_rest_info(
 >> location=Location(cloud='example.com', service_type='compute'))
 >> servers = requests.get(
 >> rest_info.url + '/servers',
 >> headers=rest_info.headers).json()
 >>
 >> or, maybe that's the gRPC call and there is a call in each language's
 >> client lib that returns a properly constructed rest client...
 >>
 >> conn = oaktreemodel.Connect()
 >> compute = conn.get_adapter(
 >> location=Location(cloud='example.com', service_type='compute'))
 >> servers = compute.get('/servers').json()
 >>
 >> *waves hands* - needs to be thought about, designed and implemented.
 >>
 >> Medium Term TODOs
 >> =================
 >>
 >> Authentication
 >> --------------
 >>
 >> oaktree is currently not authenticated. It works great on a laptop or in
 >> a location that's locked down through some other means, which should be
 >> fine for the first steps of the telco/edge use case, as well as for the
 >> developer use case getting started with it - but it's obviously not
 >> suitable for a multi-user service. The thinking thusfar has been to NOT
 >> use keystone for auth, since that introduces the need for having a gRPC
 >> auth plugin for clients, as well as doing some sort of REST/gRPC dance.
 >>
 >> BUT - whether that's the right choice and what the right choice actually
 >> is is an open question on purpose - getting input from the operators on
 >> what mechanism works best is important. Maybe making a keystone gRPC
 >> auth driver and using keystone is the right choice. Maybe it isn't.
 >> Let's talk about it.
 >>
 >> Authorization
 >> -------------
 >>
 >> Since it's currently only a single-user service, it operates off of a
 >> pre-existing local clouds.yaml to define which clouds it has access to.
 >> Long-term one can imagine that one would want to authorize an oaktree to
 >> talk to a particular cloud-region in some manner. This needs to be
 >> designed.
 >>
 >> Multi-user Caching
 >> ------------------
 >>
 >> oaktree currently uses the caching support in shade for its caching.
 >> Although it is based on dogpile.cache which means it has support for
 >> shared backends like redis or memcached, it hasn't really been vetted
 >> for multi-user sharing a single cache. It'll be fine for the next 6-9
 >> months, but once we go multi-user I'd be concerned about it - so we
 >> should consider the caching layer design.
 >>
 >> shade oaktreemodel backend
 >> --------------------------
 >>
 >> In an ultimate fit of snake eating its own tail, we should add support
 >> to shade for making client connections to an oaktree if one exists. This
 >> should obviously be pretty direct passthrough. That would mean that an
 >> oaktree talking to another oaktree would be able to do so via the gRPC
 >> layer without any intermediate protobuf-to-dict translation steps.
 >>
 >> That leads us to potentially just using the oaktreemodel protobuf
 >> objects as the basis for the in-memory resource objects inside of
 >> sdk/shade - but that's inception-y enough that we should just skip it
 >> for now. If protobuf->json translations are what's killing us, that's a
 >> great problem to have.
 >>
 >> Timetable
 >> =========
 >>
 >> I think we should aim for having something that's usable/discussable for
 >> the single/trusted-user use case for real work (install an oaktree
 >> yourself pointed at a clouds.yaml file and talk to it locally without
 >> auth) by the Dublin PTG. It doesn't have to do everything, but we should
 >> at least have a sense of whether this will solve the needs of the people
 >> who were interested in this topic so that we'll know whether figuring
 >> out the auth story is worth-while or if this is all a terrible idea.
 >>
 >> I think it's TOTALLY reasonable that by Vancouver we should have a thing
 >> that's legit usable for folks who have the pain point today (given the
 >> auth constraint)
 >>
 >> If that works out, discuss auth in Vancouver and aim to have it figured
 >> out and implemented by Berlin so that we can actually start pushing
 >> clouds to include oaktree in their deployments.
 >>
 >> Conclusion
 >> ==========
 >>
 >> Ok. That's the braindump from me. Let me know if you wanna dive in,
 >> we'll get a core team fleshed out and an IRC meeting set up and folks
 >> can start cranking.
 >>
 >> Thanks!
 >> Monty
 >>
 >> [0] http://git.openstack.org/cgit/openstack/oaktree
 >> [1] http://git.openstack.org/cgit/openstack/oaktreemodel
 >> [2] https://storyboard.openstack.org/#!/project/855
 >> [3] https://storyboard.openstack.org/#!/project/856
 >> [4] https://review.openstack.org/#/c/512561/
 >> [5] https://review.openstack.org/#/c/492531/
 >>
 >> 
__________________________________________________________________________
 >>
 >> OpenStack Development Mailing List (not for usage questions)
 >> Unsubscribe:
 >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
 >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 >
 > 
__________________________________________________________________________
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe: 
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list