[openstack-dev] [oaktree] Follow up to Multi-cloud Management in OpenStack Summit session

Joshua Harlow harlowja at fastmail.com
Wed Nov 29 01:12:02 UTC 2017


Monty Taylor wrote:
> On 11/28/2017 06:05 PM, Joshua Harlow wrote:
>  > So just curious.
>  >
>  > I didn't think shade had any federation logic in it; so I assume it will
>  > start getting some?
>
> It's possible that we're missing each other on the definition of the
> word 'federation' ... but shade's entire purpose in life is to allow
> sane use of multiple clouds from the same application.

Ya I think u got it, shade is like what I would call the rubber hits the 
road part of federation; so it will be interesting to see how such 
rubber can be used to build what I would call the higher level 
federation (without screwing it up, lol).

>
>  > Has there been any prelim. design around what the APIs of this would be
>  > and how they would work and how they would return data from X other
>  > clouds in a uniform manner? (I'd really be interested in how a high
>  > level project is going to combine various resources from other clouds in
>  > a way that doesn't look like crap).
>
> (tl;dr - yes)
>
> Ah - I grok what you're saying now. Great question!
>
> There are (at least) four sides to this.
>
> * Creating a resource in a specific location (boot a VM in OVH BHS1)
> * Fetching resources from a specific location (show me the image in
> vexxhost)
>
> * Creating a resource everywhere (upload an image to all cloud regions)
> * Fetching resources from all locations (show me all my VMs)
>
> The first two are fully handled, as you might imagine, although the
> mechanism is slightly different in shade and oaktree (I'll get back to
> that in a sec)
>
> Creating everywhere isn't terribly complex - when I need to do that
> today it's a simple loop:
>
> for cloud in shade.openstack_clouds():
> cloud.create_image('my-image', filename='my-image.qcow2')

Ya, scatter/gather (with some kind of new grpc streaming response..)

>
> But we can (and should and will) add some syntactic sugar to make that
> easier. Like (*waving hands*)
>
> all_clouds = shade.everwhere()
> all_clouds.create_image('my-image', filename='my-image.qcow2')

Might as well just start to call it scatter/gather, lol

>
> It's actually more complex than that, because Rackspace wants a VHD and
> OVH wants a RAW but can take a qcow2 as well... but this is an email, so
> for now let's assume that we can handle the general 'create everywhere'
> with a smidge of meta programming, some explicit overrides for the
> resources that need extra special things - and probably something like
> concurrent.futures.ThreadPoolExecutor.
>
> The real fun, as you hint at, comes when we want to read from everywhere.
>
> To prep for this (and inspired specifically be this use-case), shade now
> adds a "location" field to every resource it returns. That location
> field contains cloud, region, domain and project information - so that
> in a list of server objects from across 14 regions of 6 clouds all the
> info about who and what they are is right there in the object.
>
> When we shift to the oaktree gRPC interface, we carry over the Location
> concept:
>
>
> http://git.openstack.org/cgit/openstack/oaktreemodel/tree/oaktreemodel/common.proto#n31
>
>
> which we keep on all of the resources:
>
>
> http://git.openstack.org/cgit/openstack/oaktreemodel/tree/oaktreemodel/image.proto#n49
>
>
> So listing all the things should work the same way as the above
> list-from-everywhere method.
>
> The difference I mentioned earlier in how shade and oaktree present the
> location interface is that in shade there is a an OpenStackCloud object
> per cloud-region, and as a user you select which cloud you operate on
> via instantiating an OpenStackCloud pointed at the right thing. We need
> to add the AllTheClouds meta object for the shade interface.
>
> In oaktree, there is the one oaktree instance and it contains
> information about all of your cloud-regions, so Locations and Filters
> become a parameters on operations.
>
>  > Will this thing also have its own database (or something like a DB)?
>
> It's an open question. Certainly not at the moment or in the near future
> - there's no need for one, as the constituent OpenStack clouds are the
> actual source of truth, the thing we need is caching rather than data
> that is canonical itself.

That's fine, it prob only becomes a problem if there is a need for some 
kind of cross cloud consistency requirements (which ideally this whole 
thing would strongly avoid)....

>
> This will almost certainly change as we work on the auth story, but the
> specifics of that are ones that need to be sorted out collectively -
> preferably with operators involved.
>
>  > I can imagine if there is a `create_many_servers` call in oaktree that
>  > it will need to have some sort of lock taken by the process doing this
>  > set of XYZ calls (in the right order) so that some other
>  > `create_many_servers` call doesn't come in and screw everything the
>  > prior one up... Or maybe cross-cloud consistency issues aren't a
>  > concern... What's the thoughts here?
> That we have already, actually, and you've even landed code in it. :)
> shade executes all of its remote operations through a TaskManager. The
> default one that you get if you're just running some ansible is a
> pass-through. However, in nodepool we have a multi-threaded
> rate-limiting TaskManager that ensures that we're only ever doing one
> operation at a time for a given cloud-region, and that we're keeping
> ourselves inside of a configurable rate limit (learned the hard-way from
> crashing a few public clouds)

Ok, so that is a single process; I assume oaktree will be multi-process 
and multi-machine? If so this is going to get 'fun' real quick, lol

>
> It's worth noting that shade is not transactional (although there are a
> few places where, if shade created a resource on the user's behalf that
> the user doesn't know about, it will delete it on error) So for "create
> many servers" the process for each will succeed or fail. Depending on
> the resource it's either safe or not safe to retry without deleting -
> but that'll be something we'll want to rationalize in an oaktree context.
>
> For things that do need a greater amount of transactional consistency,
> like "I want 4 vms, a private network and a load balancer in each of my
> cloud-regions" ... I believe the shade/oaktree operation would be "run
> this heat template everywhere" Heat already handles convergence
> operations, shade trying to from the outside would be OY.

Heat is per-cloud though (afaik) so this wouldn't really be federated if 
it is delegated to a single cloud; right?

>
>  >
>  > What happens in the above if a third user Y is creating resources in one
>  > of those clouds outside the view of oaktree... ya da ya da... What
>  > happens if they are both targeting the same tenant...
>
> Yup. That should actually work fine (we do this all the time)- it's why
> we assume the cloud is the source of truth for what exists and not a
> local data store (two phase commits across WAN links anybody?)
>
>  > Perhaps a decent idea to start some kind of etherpad to start listing
>  > these questions (and at least think about them a wee bit) down?
>
> Sounds great!

Did you make it yet, lol.

>
>  > Monty Taylor wrote:
>  >> Hey everybody!
>  >>
>  >> https://etherpad.openstack.org/p/sydney-forum-multi-cloud-management
>  >>
>  >> I've CC'd everyone who listed interest directly, just in case you're
> not
>  >> already on the openstack-dev list. If you aren't, and you are in fact
>  >> interested in this topic, please subscribe and make sure to watch for
>  >> [oaktree] subject headings.
>  >>
>  >> We had a great session in Sydney about the needs of managing resources
>  >> across multiple clouds. During the session I pointed out the work that
>  >> had been started in the Oaktree project [0][1] and offered that if the
>  >> people who were interested in the topic thought we'd make progress best
>  >> by basing the work on oaktree, that we should bootstrap a new core team
>  >> and kick off some weekly meetings. This is, therefore, the kickoff
> email
>  >> to get that off the ground.
>  >>
>  >> All of the below is thoughts from me and a description of where
> we're at
>  >> right now. It should all be considered up for debate, except for two
>  >> things:
>  >>
>  >> - gRPC API
>  >> - backend implementation based on shade
>  >>
>  >> As those are the two defining characteristics of the project. For those
>  >> who weren't in the room, justifications for those two characteristics
>  >> are:
>  >>
>  >> gRPC API
>  >> --------
>  >>
>  >> There are several reasons why gRPC.
>  >>
>  >> * Make it clear this is not a competing REST API.
>  >>
>  >> OpenStack has a REST API already. This is more like a 'federation' API
>  >> that knows how to talk to one or more clouds (similar to the kubernetes
>  >> federation API)
>  >>
>  >> * Streaming and async built in
>  >>
>  >> One of the most costly things in using the OpenStack API is polling.
>  >> gRPC is based on HTTP/2 and thus supports streaming and other exciting
>  >> things. This means an oaktree running in or on a cloud can do its
>  >> polling loops over the local network and the client can just either
> wait
>  >> on a streaming call until the resource is ready, or can fire an async
>  >> call and deal with it later on a notification channel.
>  >>
>  >> * Network efficiency
>  >>
>  >> Protobuf over HTTP/2 is a super-streamlined binary protocol, which
>  >> should actually be really nice for our friends in Telco land who are
>  >> using OpenStack for Edge-related tasks in 1000s of sites. All those
>  >> roundtrips add up at scale.
>  >>
>  >> * Multi-language out of the box
>  >>
>  >> gRPC allows us to directly generate consistent consumption libs for a
>  >> bunch of languages - or people can grab the proto files and integrate
>  >> those into their own build if they prefer.
>  >>
>  >> * The cool kids are doing it
>  >>
>  >> To be fair, Jay Pipes and I tried to push OpenStack to use Protobuf
>  >> instead of JSON for service-to-service communication back in 2010 - so
>  >> it's not ACTUALLY a new idea... but with Google pushing it and support
>  >> from the CNCF, gRPC is actually catching on broadly. If we're writing a
>  >> new thing, let's lean forward into it.
>  >>
>  >> Backend implementation in shade
>  >> -------------------------------
>  >>
>  >> If the service is defined by gRPC protos, why not implement the service
>  >> itself in Go or C++?
>  >>
>  >> * Business logic to deal with cloud differences
>  >>
>  >> Adding a federation API isn't going to magically make all of those
>  >> clouds work the same. We've got that fairly well sorted out in shade
> and
>  >> would need to reimplement basically all of shade in other other
> language.
>  >>
>  >> * shade is battle tested at scale
>  >>
>  >> shade is what Infra's nodepool uses. In terms of high-scale API
>  >> consumption, we've learned a TON of lessons. Much of the design inside
>  >> of shade is the result of real-world scaling issues. It's Open Source,
>  >> so we could obviously copy all of that elsewhere - but why? It exists
>  >> and it works, and oaktree itself should be a scale-out shared-nothing
>  >> kind of service anyway.
>  >>
>  >> The hard bits here aren't making API calls to 3 different clouds, the
>  >> hard bits are doing that against 3 *different* clouds and presenting
> the
>  >> results sanely and consistently to the original user.
>  >>
>  >> Proposed Structure
>  >> ==================
>  >>
>  >> PTL
>  >> ---
>  >>
>  >> As the originator of the project, I'll take on the initial PTL role.
>  >> When the next PTL elections roll around, we should do a real election.
>  >>
>  >> Initial Core Team
>  >> -----------------
>  >>
>  >> oaktree is still small enough that I don't think we need to be super
>  >> protective - so I think if you're interested in working on it and you
>  >> think you'll have the bandwidth to pay attention, let me know and I'll
>  >> add you to the team.
>  >>
>  >> General rules of thumb I try to follow on top of normal OpenStack
>  >> reviewing guidelines:
>  >>
>  >> * Review should mostly be about suitability of design/approach. Style
>  >> issues should be handled by pep8/hacking (with one exception, see
>  >> below). Functional issues should be handled with tests. Let the
> machines
>  >> be machines and humans be humans.
>  >>
>  >> * Use followup patches to fix minor things rather than causing an
>  >> existing patch to get re-spun and need to be re-reviewed.
>  >>
>  >> The one style exception ... I'm a big believer in not using visual
>  >> indentation - but I can't seem to get pep8 or hacking to complain about
>  >> its use. This isn't just about style - visual indentation causes more
>  >> lines to be touched during a refactor than are necessary making the
>  >> impact of a change harder to see.
>  >>
>  >> good:
>  >>
>  >> x = some_function(
>  >> with_some, arguments)
>  >>
>  >> bad:
>  >>
>  >> x = some_function(with_some,
>  >> arguments)
>  >>
>  >> If anyone can figure out how to write a hacking rule that enforces that
>  >> I'll buy you a herd of chickens.
>  >>
>  >> Weekly Meeting
>  >> --------------
>  >>
>  >> Let's give it a week or so to see who is interested in being in the
>  >> initial core team so that we can figure out what timezones folks are in
>  >> and pick a time that works for the maximum number of people.
>  >>
>  >> IRC Channel
>  >> -----------
>  >>
>  >> oaktree development is closely related to shade development which is
> now
>  >> in #openstack-sdks, so let's stay there until we get kicked out.
>  >>
>  >> Bugs/Blueprints
>  >> ---------------
>  >>
>  >> oaktree uses storyboard [2][3]
>  >>
>  >> https://storyboard.openstack.org/#!/project/855
>  >>
>  >> oaktree tech overview
>  >> =====================
>  >>
>  >> oaktree is a service that presents a gRPC API and that uses shade as
> its
>  >> OpenStack connectivity layer. It's organized into two repos at the
>  >> moment, oaktreemodel and oaktree. The intent is that anyone should be
>  >> able to use the gRPC/protobuf definitions to create a client
>  >> implementation. It is explicitly not the intent that there be more than
>  >> one server implementation, since that would require reimplementing all
>  >> of the hairy business logic that's in shade already.
>  >>
>  >> oaktreemodel contains the protobuf definitions, as well as the
> generated
>  >> golang code. It is intended to provide a library that is easily pip
>  >> installable by anyone who wants to build a client without them needing
>  >> to add protoc steps. It contains build tooling to produce python,
> golang
>  >> and C++. The C++ and python files are generated and included in the
>  >> source sdist artifacts. Since golang uses git for consumption, the
>  >> generated golang files are committed to the oaktreemodel repo.
>  >>
>  >> oaktreemodel has a more complex build toolchain, but is organized so
>  >> that only oaktreemodel devs need to deal with it. People consuming
>  >> oaktreemodel should not need to know anything about how it's built -
> pip
>  >> install oaktreemodel or go get
>  >> https://git.openstack.org/openstack/oaktreemodel should Just Work with
>  >> no additional effort on the part of the programmer.
>  >>
>  >> oaktree contains the server implementation and depends on oaktreemodel.
>  >> It's mostly a thin shim layer mapping gRPC stubs to shade calls.
> Much of
>  >> the logic that needs to exist for oaktree to work wants to live in
>  >> shade, but I'm sure we'll find places where that's not true.
>  >>
>  >> Ultra Short-Term TODO/in-progress
>  >> =================================
>  >>
>  >> Fix Gate jobs for Zuul v3 (mordred)
>  >> -----------------------------------
>  >>
>  >> We have devstack functional gate jobs - but they haven't been updated
>  >> since the Zuul v3 migration. Duong Ha-Quang submitted a patch [4] to
>  >> migrate the legacy jobs to in-tree. We need to get that fixed up, then
>  >> migrate the job to use the new fancy devstack base job.
>  >>
>  >> I'll get this all fixed ASAP so that it's easy for folks to start
>  >> hacking on patches.
>  >>
>  >> I'm working on this one.
>  >>
>  >> A patch for oaktreemodel is in flight [5]. We still needs a patch to
>  >> oaktree to follow up, which I have half-finished. I'll get it up once
>  >> the oaktreemodel patch is green.
>  >>
>  >> Short-Term TODOs
>  >> ================
>  >>
>  >> Expose more things
>  >> ------------------
>  >>
>  >> shade has *way* more capabilities than oaktree, which is mostly a
> matter
>  >> of writing some proto definitions for resources that match the 'strict'
>  >> version of shade's data model. In some cases it might mean that we need
>  >> to define a data model contract in shade too... but by and large
> picking
>  >> things and adding them is a great way to get familiar with all the
>  >> pieces and how things flow together.
>  >>
>  >> We should also consider whether or not we can do any
> meta-programming to
>  >> map shade calls into oaktree calls automatically. For now I think we
>  >> should be fine with just having copy-pasta boilerplate until we
>  >> understand enough about the patterns to abstract them - but we
> SHOULD be
>  >> able to do some work to reduce the boilerplate.
>  >>
>  >> Write better tests
>  >> ------------------
>  >>
>  >> There are gate jobs at the moment and a tiny smoke-test script. We
>  >> should add some functional tests for python, go and C++ in the
>  >> oaktreemodel repo.
>  >>
>  >> I'm not sure a TON of unittests in oaktreemodel will be super useful -
>  >> however, some simple tests that verify we haven't borked something in
>  >> the protos that cause code to be generated improperly would be
> great. We
>  >> can do those just making sure we can create the proto objects and
>  >> whatnot without needing an actual server running.
>  >>
>  >> Unittests in oaktree itself are likely to have very little value. We
> can
>  >> always add more requests-mock unittests to shade/python-openstacksdk. I
>  >> think we should focus more on functional tests and on making sure those
>  >> tests can run against not just devstack.
>  >>
>  >> Shift calling interface from shade to python-openstacksdk
>  >> ---------------------------------------------------------
>  >>
>  >> oaktree doesn't need historical compat, so we can go ahead and start
>  >> using python-openstacksdk. Our tests will be cross-testing with master
>  >> branch commits rather than releases right now anyway.
>  >>
>  >> Add Java and Ruby build plumbing to oaktree model
>  >> -------------------------------------------------
>  >>
>  >> Protobuf/gRPC has support for java and ruby as well, we should plumb
>  >> them through as well.
>  >>
>  >> Parallel Multicloud APIs
>  >> ------------------------
>  >>
>  >> The existing APIs allow for multi-cloud consumption from the same
>  >> connection via a Location object used as a parameter to calls.
>  >> Additionally, shade adds a Location property to every object returned,
>  >> so all shade objects carry the information needed to verify uniqueness.
>  >>
>  >> However, when considering actions like:
>  >>
>  >> "I want a list of all of my servers on all of my clouds"
>  >>
>  >> the answer is currently an end-user for-loop. We should add calls to
>  >> shade for each of the list/search/get API calls that fetch from all of
>  >> the available cloud regions in parallel and then combine the results
>  >> into a single result list.
>  >>
>  >> We should also think about a design for multi-cloud creates and which
>  >> calls they make sense for. Things like image and flavor immediately
> come
>  >> to mind, as having consistent image and flavors across cloud regions is
>  >> important.
>  >>
>  >> Both of those are desired features at the shade layer, so designing and
>  >> implementing them will work great there ... but working on adding them
>  >> to shade and exposing them in oaktree at the same time will help inform
>  >> what shape of API at the shade layer serves the oaktree layer the best.
>  >>
>  >> Add REST escape hatch
>  >> ---------------------
>  >>
>  >> There are PLENTY of things that will never get added to oaktree
>  >> directly- especially things that are deployment/vendor-backend
> specific.
>  >> One of the things discussed in Sydney was adding an API call to oaktree
>  >> that would return a Protobuf that contains the root URL for a given
>  >> service along with either a token, a list of HTTP Headers to be used
> for
>  >> auth or both. So something like:
>  >>
>  >> conn = oaktreemodel.Connect()
>  >> rest_info = conn.get_rest_info(
>  >> location=Location(cloud='example.com', service_type='compute'))
>  >> servers = requests.get(
>  >> rest_info.url + '/servers',
>  >> headers=rest_info.headers).json()
>  >>
>  >> or, maybe that's the gRPC call and there is a call in each language's
>  >> client lib that returns a properly constructed rest client...
>  >>
>  >> conn = oaktreemodel.Connect()
>  >> compute = conn.get_adapter(
>  >> location=Location(cloud='example.com', service_type='compute'))
>  >> servers = compute.get('/servers').json()
>  >>
>  >> *waves hands* - needs to be thought about, designed and implemented.
>  >>
>  >> Medium Term TODOs
>  >> =================
>  >>
>  >> Authentication
>  >> --------------
>  >>
>  >> oaktree is currently not authenticated. It works great on a laptop
> or in
>  >> a location that's locked down through some other means, which should be
>  >> fine for the first steps of the telco/edge use case, as well as for the
>  >> developer use case getting started with it - but it's obviously not
>  >> suitable for a multi-user service. The thinking thusfar has been to NOT
>  >> use keystone for auth, since that introduces the need for having a gRPC
>  >> auth plugin for clients, as well as doing some sort of REST/gRPC dance.
>  >>
>  >> BUT - whether that's the right choice and what the right choice
> actually
>  >> is is an open question on purpose - getting input from the operators on
>  >> what mechanism works best is important. Maybe making a keystone gRPC
>  >> auth driver and using keystone is the right choice. Maybe it isn't.
>  >> Let's talk about it.
>  >>
>  >> Authorization
>  >> -------------
>  >>
>  >> Since it's currently only a single-user service, it operates off of a
>  >> pre-existing local clouds.yaml to define which clouds it has access to.
>  >> Long-term one can imagine that one would want to authorize an
> oaktree to
>  >> talk to a particular cloud-region in some manner. This needs to be
>  >> designed.
>  >>
>  >> Multi-user Caching
>  >> ------------------
>  >>
>  >> oaktree currently uses the caching support in shade for its caching.
>  >> Although it is based on dogpile.cache which means it has support for
>  >> shared backends like redis or memcached, it hasn't really been vetted
>  >> for multi-user sharing a single cache. It'll be fine for the next 6-9
>  >> months, but once we go multi-user I'd be concerned about it - so we
>  >> should consider the caching layer design.
>  >>
>  >> shade oaktreemodel backend
>  >> --------------------------
>  >>
>  >> In an ultimate fit of snake eating its own tail, we should add support
>  >> to shade for making client connections to an oaktree if one exists.
> This
>  >> should obviously be pretty direct passthrough. That would mean that an
>  >> oaktree talking to another oaktree would be able to do so via the gRPC
>  >> layer without any intermediate protobuf-to-dict translation steps.
>  >>
>  >> That leads us to potentially just using the oaktreemodel protobuf
>  >> objects as the basis for the in-memory resource objects inside of
>  >> sdk/shade - but that's inception-y enough that we should just skip it
>  >> for now. If protobuf->json translations are what's killing us, that's a
>  >> great problem to have.
>  >>
>  >> Timetable
>  >> =========
>  >>
>  >> I think we should aim for having something that's usable/discussable
> for
>  >> the single/trusted-user use case for real work (install an oaktree
>  >> yourself pointed at a clouds.yaml file and talk to it locally without
>  >> auth) by the Dublin PTG. It doesn't have to do everything, but we
> should
>  >> at least have a sense of whether this will solve the needs of the
> people
>  >> who were interested in this topic so that we'll know whether figuring
>  >> out the auth story is worth-while or if this is all a terrible idea.
>  >>
>  >> I think it's TOTALLY reasonable that by Vancouver we should have a
> thing
>  >> that's legit usable for folks who have the pain point today (given the
>  >> auth constraint)
>  >>
>  >> If that works out, discuss auth in Vancouver and aim to have it figured
>  >> out and implemented by Berlin so that we can actually start pushing
>  >> clouds to include oaktree in their deployments.
>  >>
>  >> Conclusion
>  >> ==========
>  >>
>  >> Ok. That's the braindump from me. Let me know if you wanna dive in,
>  >> we'll get a core team fleshed out and an IRC meeting set up and folks
>  >> can start cranking.
>  >>
>  >> Thanks!
>  >> Monty
>  >>
>  >> [0] http://git.openstack.org/cgit/openstack/oaktree
>  >> [1] http://git.openstack.org/cgit/openstack/oaktreemodel
>  >> [2] https://storyboard.openstack.org/#!/project/855
>  >> [3] https://storyboard.openstack.org/#!/project/856
>  >> [4] https://review.openstack.org/#/c/512561/
>  >> [5] https://review.openstack.org/#/c/492531/
>  >>
>  >>
> __________________________________________________________________________
>  >>
>  >> OpenStack Development Mailing List (not for usage questions)
>  >> Unsubscribe:
>  >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>  >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>  >
>  >
> __________________________________________________________________________
>  > OpenStack Development Mailing List (not for usage questions)
>  > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>  > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list