[openstack-dev] [oaktree] Follow up to Multi-cloud Management in OpenStack Summit session
Joshua Harlow
harlowja at fastmail.com
Wed Nov 29 01:12:02 UTC 2017
Monty Taylor wrote:
> On 11/28/2017 06:05 PM, Joshua Harlow wrote:
> > So just curious.
> >
> > I didn't think shade had any federation logic in it; so I assume it will
> > start getting some?
>
> It's possible that we're missing each other on the definition of the
> word 'federation' ... but shade's entire purpose in life is to allow
> sane use of multiple clouds from the same application.
Ya I think u got it, shade is like what I would call the rubber hits the
road part of federation; so it will be interesting to see how such
rubber can be used to build what I would call the higher level
federation (without screwing it up, lol).
>
> > Has there been any prelim. design around what the APIs of this would be
> > and how they would work and how they would return data from X other
> > clouds in a uniform manner? (I'd really be interested in how a high
> > level project is going to combine various resources from other clouds in
> > a way that doesn't look like crap).
>
> (tl;dr - yes)
>
> Ah - I grok what you're saying now. Great question!
>
> There are (at least) four sides to this.
>
> * Creating a resource in a specific location (boot a VM in OVH BHS1)
> * Fetching resources from a specific location (show me the image in
> vexxhost)
>
> * Creating a resource everywhere (upload an image to all cloud regions)
> * Fetching resources from all locations (show me all my VMs)
>
> The first two are fully handled, as you might imagine, although the
> mechanism is slightly different in shade and oaktree (I'll get back to
> that in a sec)
>
> Creating everywhere isn't terribly complex - when I need to do that
> today it's a simple loop:
>
> for cloud in shade.openstack_clouds():
> cloud.create_image('my-image', filename='my-image.qcow2')
Ya, scatter/gather (with some kind of new grpc streaming response..)
>
> But we can (and should and will) add some syntactic sugar to make that
> easier. Like (*waving hands*)
>
> all_clouds = shade.everwhere()
> all_clouds.create_image('my-image', filename='my-image.qcow2')
Might as well just start to call it scatter/gather, lol
>
> It's actually more complex than that, because Rackspace wants a VHD and
> OVH wants a RAW but can take a qcow2 as well... but this is an email, so
> for now let's assume that we can handle the general 'create everywhere'
> with a smidge of meta programming, some explicit overrides for the
> resources that need extra special things - and probably something like
> concurrent.futures.ThreadPoolExecutor.
>
> The real fun, as you hint at, comes when we want to read from everywhere.
>
> To prep for this (and inspired specifically be this use-case), shade now
> adds a "location" field to every resource it returns. That location
> field contains cloud, region, domain and project information - so that
> in a list of server objects from across 14 regions of 6 clouds all the
> info about who and what they are is right there in the object.
>
> When we shift to the oaktree gRPC interface, we carry over the Location
> concept:
>
>
> http://git.openstack.org/cgit/openstack/oaktreemodel/tree/oaktreemodel/common.proto#n31
>
>
> which we keep on all of the resources:
>
>
> http://git.openstack.org/cgit/openstack/oaktreemodel/tree/oaktreemodel/image.proto#n49
>
>
> So listing all the things should work the same way as the above
> list-from-everywhere method.
>
> The difference I mentioned earlier in how shade and oaktree present the
> location interface is that in shade there is a an OpenStackCloud object
> per cloud-region, and as a user you select which cloud you operate on
> via instantiating an OpenStackCloud pointed at the right thing. We need
> to add the AllTheClouds meta object for the shade interface.
>
> In oaktree, there is the one oaktree instance and it contains
> information about all of your cloud-regions, so Locations and Filters
> become a parameters on operations.
>
> > Will this thing also have its own database (or something like a DB)?
>
> It's an open question. Certainly not at the moment or in the near future
> - there's no need for one, as the constituent OpenStack clouds are the
> actual source of truth, the thing we need is caching rather than data
> that is canonical itself.
That's fine, it prob only becomes a problem if there is a need for some
kind of cross cloud consistency requirements (which ideally this whole
thing would strongly avoid)....
>
> This will almost certainly change as we work on the auth story, but the
> specifics of that are ones that need to be sorted out collectively -
> preferably with operators involved.
>
> > I can imagine if there is a `create_many_servers` call in oaktree that
> > it will need to have some sort of lock taken by the process doing this
> > set of XYZ calls (in the right order) so that some other
> > `create_many_servers` call doesn't come in and screw everything the
> > prior one up... Or maybe cross-cloud consistency issues aren't a
> > concern... What's the thoughts here?
> That we have already, actually, and you've even landed code in it. :)
> shade executes all of its remote operations through a TaskManager. The
> default one that you get if you're just running some ansible is a
> pass-through. However, in nodepool we have a multi-threaded
> rate-limiting TaskManager that ensures that we're only ever doing one
> operation at a time for a given cloud-region, and that we're keeping
> ourselves inside of a configurable rate limit (learned the hard-way from
> crashing a few public clouds)
Ok, so that is a single process; I assume oaktree will be multi-process
and multi-machine? If so this is going to get 'fun' real quick, lol
>
> It's worth noting that shade is not transactional (although there are a
> few places where, if shade created a resource on the user's behalf that
> the user doesn't know about, it will delete it on error) So for "create
> many servers" the process for each will succeed or fail. Depending on
> the resource it's either safe or not safe to retry without deleting -
> but that'll be something we'll want to rationalize in an oaktree context.
>
> For things that do need a greater amount of transactional consistency,
> like "I want 4 vms, a private network and a load balancer in each of my
> cloud-regions" ... I believe the shade/oaktree operation would be "run
> this heat template everywhere" Heat already handles convergence
> operations, shade trying to from the outside would be OY.
Heat is per-cloud though (afaik) so this wouldn't really be federated if
it is delegated to a single cloud; right?
>
> >
> > What happens in the above if a third user Y is creating resources in one
> > of those clouds outside the view of oaktree... ya da ya da... What
> > happens if they are both targeting the same tenant...
>
> Yup. That should actually work fine (we do this all the time)- it's why
> we assume the cloud is the source of truth for what exists and not a
> local data store (two phase commits across WAN links anybody?)
>
> > Perhaps a decent idea to start some kind of etherpad to start listing
> > these questions (and at least think about them a wee bit) down?
>
> Sounds great!
Did you make it yet, lol.
>
> > Monty Taylor wrote:
> >> Hey everybody!
> >>
> >> https://etherpad.openstack.org/p/sydney-forum-multi-cloud-management
> >>
> >> I've CC'd everyone who listed interest directly, just in case you're
> not
> >> already on the openstack-dev list. If you aren't, and you are in fact
> >> interested in this topic, please subscribe and make sure to watch for
> >> [oaktree] subject headings.
> >>
> >> We had a great session in Sydney about the needs of managing resources
> >> across multiple clouds. During the session I pointed out the work that
> >> had been started in the Oaktree project [0][1] and offered that if the
> >> people who were interested in the topic thought we'd make progress best
> >> by basing the work on oaktree, that we should bootstrap a new core team
> >> and kick off some weekly meetings. This is, therefore, the kickoff
> email
> >> to get that off the ground.
> >>
> >> All of the below is thoughts from me and a description of where
> we're at
> >> right now. It should all be considered up for debate, except for two
> >> things:
> >>
> >> - gRPC API
> >> - backend implementation based on shade
> >>
> >> As those are the two defining characteristics of the project. For those
> >> who weren't in the room, justifications for those two characteristics
> >> are:
> >>
> >> gRPC API
> >> --------
> >>
> >> There are several reasons why gRPC.
> >>
> >> * Make it clear this is not a competing REST API.
> >>
> >> OpenStack has a REST API already. This is more like a 'federation' API
> >> that knows how to talk to one or more clouds (similar to the kubernetes
> >> federation API)
> >>
> >> * Streaming and async built in
> >>
> >> One of the most costly things in using the OpenStack API is polling.
> >> gRPC is based on HTTP/2 and thus supports streaming and other exciting
> >> things. This means an oaktree running in or on a cloud can do its
> >> polling loops over the local network and the client can just either
> wait
> >> on a streaming call until the resource is ready, or can fire an async
> >> call and deal with it later on a notification channel.
> >>
> >> * Network efficiency
> >>
> >> Protobuf over HTTP/2 is a super-streamlined binary protocol, which
> >> should actually be really nice for our friends in Telco land who are
> >> using OpenStack for Edge-related tasks in 1000s of sites. All those
> >> roundtrips add up at scale.
> >>
> >> * Multi-language out of the box
> >>
> >> gRPC allows us to directly generate consistent consumption libs for a
> >> bunch of languages - or people can grab the proto files and integrate
> >> those into their own build if they prefer.
> >>
> >> * The cool kids are doing it
> >>
> >> To be fair, Jay Pipes and I tried to push OpenStack to use Protobuf
> >> instead of JSON for service-to-service communication back in 2010 - so
> >> it's not ACTUALLY a new idea... but with Google pushing it and support
> >> from the CNCF, gRPC is actually catching on broadly. If we're writing a
> >> new thing, let's lean forward into it.
> >>
> >> Backend implementation in shade
> >> -------------------------------
> >>
> >> If the service is defined by gRPC protos, why not implement the service
> >> itself in Go or C++?
> >>
> >> * Business logic to deal with cloud differences
> >>
> >> Adding a federation API isn't going to magically make all of those
> >> clouds work the same. We've got that fairly well sorted out in shade
> and
> >> would need to reimplement basically all of shade in other other
> language.
> >>
> >> * shade is battle tested at scale
> >>
> >> shade is what Infra's nodepool uses. In terms of high-scale API
> >> consumption, we've learned a TON of lessons. Much of the design inside
> >> of shade is the result of real-world scaling issues. It's Open Source,
> >> so we could obviously copy all of that elsewhere - but why? It exists
> >> and it works, and oaktree itself should be a scale-out shared-nothing
> >> kind of service anyway.
> >>
> >> The hard bits here aren't making API calls to 3 different clouds, the
> >> hard bits are doing that against 3 *different* clouds and presenting
> the
> >> results sanely and consistently to the original user.
> >>
> >> Proposed Structure
> >> ==================
> >>
> >> PTL
> >> ---
> >>
> >> As the originator of the project, I'll take on the initial PTL role.
> >> When the next PTL elections roll around, we should do a real election.
> >>
> >> Initial Core Team
> >> -----------------
> >>
> >> oaktree is still small enough that I don't think we need to be super
> >> protective - so I think if you're interested in working on it and you
> >> think you'll have the bandwidth to pay attention, let me know and I'll
> >> add you to the team.
> >>
> >> General rules of thumb I try to follow on top of normal OpenStack
> >> reviewing guidelines:
> >>
> >> * Review should mostly be about suitability of design/approach. Style
> >> issues should be handled by pep8/hacking (with one exception, see
> >> below). Functional issues should be handled with tests. Let the
> machines
> >> be machines and humans be humans.
> >>
> >> * Use followup patches to fix minor things rather than causing an
> >> existing patch to get re-spun and need to be re-reviewed.
> >>
> >> The one style exception ... I'm a big believer in not using visual
> >> indentation - but I can't seem to get pep8 or hacking to complain about
> >> its use. This isn't just about style - visual indentation causes more
> >> lines to be touched during a refactor than are necessary making the
> >> impact of a change harder to see.
> >>
> >> good:
> >>
> >> x = some_function(
> >> with_some, arguments)
> >>
> >> bad:
> >>
> >> x = some_function(with_some,
> >> arguments)
> >>
> >> If anyone can figure out how to write a hacking rule that enforces that
> >> I'll buy you a herd of chickens.
> >>
> >> Weekly Meeting
> >> --------------
> >>
> >> Let's give it a week or so to see who is interested in being in the
> >> initial core team so that we can figure out what timezones folks are in
> >> and pick a time that works for the maximum number of people.
> >>
> >> IRC Channel
> >> -----------
> >>
> >> oaktree development is closely related to shade development which is
> now
> >> in #openstack-sdks, so let's stay there until we get kicked out.
> >>
> >> Bugs/Blueprints
> >> ---------------
> >>
> >> oaktree uses storyboard [2][3]
> >>
> >> https://storyboard.openstack.org/#!/project/855
> >>
> >> oaktree tech overview
> >> =====================
> >>
> >> oaktree is a service that presents a gRPC API and that uses shade as
> its
> >> OpenStack connectivity layer. It's organized into two repos at the
> >> moment, oaktreemodel and oaktree. The intent is that anyone should be
> >> able to use the gRPC/protobuf definitions to create a client
> >> implementation. It is explicitly not the intent that there be more than
> >> one server implementation, since that would require reimplementing all
> >> of the hairy business logic that's in shade already.
> >>
> >> oaktreemodel contains the protobuf definitions, as well as the
> generated
> >> golang code. It is intended to provide a library that is easily pip
> >> installable by anyone who wants to build a client without them needing
> >> to add protoc steps. It contains build tooling to produce python,
> golang
> >> and C++. The C++ and python files are generated and included in the
> >> source sdist artifacts. Since golang uses git for consumption, the
> >> generated golang files are committed to the oaktreemodel repo.
> >>
> >> oaktreemodel has a more complex build toolchain, but is organized so
> >> that only oaktreemodel devs need to deal with it. People consuming
> >> oaktreemodel should not need to know anything about how it's built -
> pip
> >> install oaktreemodel or go get
> >> https://git.openstack.org/openstack/oaktreemodel should Just Work with
> >> no additional effort on the part of the programmer.
> >>
> >> oaktree contains the server implementation and depends on oaktreemodel.
> >> It's mostly a thin shim layer mapping gRPC stubs to shade calls.
> Much of
> >> the logic that needs to exist for oaktree to work wants to live in
> >> shade, but I'm sure we'll find places where that's not true.
> >>
> >> Ultra Short-Term TODO/in-progress
> >> =================================
> >>
> >> Fix Gate jobs for Zuul v3 (mordred)
> >> -----------------------------------
> >>
> >> We have devstack functional gate jobs - but they haven't been updated
> >> since the Zuul v3 migration. Duong Ha-Quang submitted a patch [4] to
> >> migrate the legacy jobs to in-tree. We need to get that fixed up, then
> >> migrate the job to use the new fancy devstack base job.
> >>
> >> I'll get this all fixed ASAP so that it's easy for folks to start
> >> hacking on patches.
> >>
> >> I'm working on this one.
> >>
> >> A patch for oaktreemodel is in flight [5]. We still needs a patch to
> >> oaktree to follow up, which I have half-finished. I'll get it up once
> >> the oaktreemodel patch is green.
> >>
> >> Short-Term TODOs
> >> ================
> >>
> >> Expose more things
> >> ------------------
> >>
> >> shade has *way* more capabilities than oaktree, which is mostly a
> matter
> >> of writing some proto definitions for resources that match the 'strict'
> >> version of shade's data model. In some cases it might mean that we need
> >> to define a data model contract in shade too... but by and large
> picking
> >> things and adding them is a great way to get familiar with all the
> >> pieces and how things flow together.
> >>
> >> We should also consider whether or not we can do any
> meta-programming to
> >> map shade calls into oaktree calls automatically. For now I think we
> >> should be fine with just having copy-pasta boilerplate until we
> >> understand enough about the patterns to abstract them - but we
> SHOULD be
> >> able to do some work to reduce the boilerplate.
> >>
> >> Write better tests
> >> ------------------
> >>
> >> There are gate jobs at the moment and a tiny smoke-test script. We
> >> should add some functional tests for python, go and C++ in the
> >> oaktreemodel repo.
> >>
> >> I'm not sure a TON of unittests in oaktreemodel will be super useful -
> >> however, some simple tests that verify we haven't borked something in
> >> the protos that cause code to be generated improperly would be
> great. We
> >> can do those just making sure we can create the proto objects and
> >> whatnot without needing an actual server running.
> >>
> >> Unittests in oaktree itself are likely to have very little value. We
> can
> >> always add more requests-mock unittests to shade/python-openstacksdk. I
> >> think we should focus more on functional tests and on making sure those
> >> tests can run against not just devstack.
> >>
> >> Shift calling interface from shade to python-openstacksdk
> >> ---------------------------------------------------------
> >>
> >> oaktree doesn't need historical compat, so we can go ahead and start
> >> using python-openstacksdk. Our tests will be cross-testing with master
> >> branch commits rather than releases right now anyway.
> >>
> >> Add Java and Ruby build plumbing to oaktree model
> >> -------------------------------------------------
> >>
> >> Protobuf/gRPC has support for java and ruby as well, we should plumb
> >> them through as well.
> >>
> >> Parallel Multicloud APIs
> >> ------------------------
> >>
> >> The existing APIs allow for multi-cloud consumption from the same
> >> connection via a Location object used as a parameter to calls.
> >> Additionally, shade adds a Location property to every object returned,
> >> so all shade objects carry the information needed to verify uniqueness.
> >>
> >> However, when considering actions like:
> >>
> >> "I want a list of all of my servers on all of my clouds"
> >>
> >> the answer is currently an end-user for-loop. We should add calls to
> >> shade for each of the list/search/get API calls that fetch from all of
> >> the available cloud regions in parallel and then combine the results
> >> into a single result list.
> >>
> >> We should also think about a design for multi-cloud creates and which
> >> calls they make sense for. Things like image and flavor immediately
> come
> >> to mind, as having consistent image and flavors across cloud regions is
> >> important.
> >>
> >> Both of those are desired features at the shade layer, so designing and
> >> implementing them will work great there ... but working on adding them
> >> to shade and exposing them in oaktree at the same time will help inform
> >> what shape of API at the shade layer serves the oaktree layer the best.
> >>
> >> Add REST escape hatch
> >> ---------------------
> >>
> >> There are PLENTY of things that will never get added to oaktree
> >> directly- especially things that are deployment/vendor-backend
> specific.
> >> One of the things discussed in Sydney was adding an API call to oaktree
> >> that would return a Protobuf that contains the root URL for a given
> >> service along with either a token, a list of HTTP Headers to be used
> for
> >> auth or both. So something like:
> >>
> >> conn = oaktreemodel.Connect()
> >> rest_info = conn.get_rest_info(
> >> location=Location(cloud='example.com', service_type='compute'))
> >> servers = requests.get(
> >> rest_info.url + '/servers',
> >> headers=rest_info.headers).json()
> >>
> >> or, maybe that's the gRPC call and there is a call in each language's
> >> client lib that returns a properly constructed rest client...
> >>
> >> conn = oaktreemodel.Connect()
> >> compute = conn.get_adapter(
> >> location=Location(cloud='example.com', service_type='compute'))
> >> servers = compute.get('/servers').json()
> >>
> >> *waves hands* - needs to be thought about, designed and implemented.
> >>
> >> Medium Term TODOs
> >> =================
> >>
> >> Authentication
> >> --------------
> >>
> >> oaktree is currently not authenticated. It works great on a laptop
> or in
> >> a location that's locked down through some other means, which should be
> >> fine for the first steps of the telco/edge use case, as well as for the
> >> developer use case getting started with it - but it's obviously not
> >> suitable for a multi-user service. The thinking thusfar has been to NOT
> >> use keystone for auth, since that introduces the need for having a gRPC
> >> auth plugin for clients, as well as doing some sort of REST/gRPC dance.
> >>
> >> BUT - whether that's the right choice and what the right choice
> actually
> >> is is an open question on purpose - getting input from the operators on
> >> what mechanism works best is important. Maybe making a keystone gRPC
> >> auth driver and using keystone is the right choice. Maybe it isn't.
> >> Let's talk about it.
> >>
> >> Authorization
> >> -------------
> >>
> >> Since it's currently only a single-user service, it operates off of a
> >> pre-existing local clouds.yaml to define which clouds it has access to.
> >> Long-term one can imagine that one would want to authorize an
> oaktree to
> >> talk to a particular cloud-region in some manner. This needs to be
> >> designed.
> >>
> >> Multi-user Caching
> >> ------------------
> >>
> >> oaktree currently uses the caching support in shade for its caching.
> >> Although it is based on dogpile.cache which means it has support for
> >> shared backends like redis or memcached, it hasn't really been vetted
> >> for multi-user sharing a single cache. It'll be fine for the next 6-9
> >> months, but once we go multi-user I'd be concerned about it - so we
> >> should consider the caching layer design.
> >>
> >> shade oaktreemodel backend
> >> --------------------------
> >>
> >> In an ultimate fit of snake eating its own tail, we should add support
> >> to shade for making client connections to an oaktree if one exists.
> This
> >> should obviously be pretty direct passthrough. That would mean that an
> >> oaktree talking to another oaktree would be able to do so via the gRPC
> >> layer without any intermediate protobuf-to-dict translation steps.
> >>
> >> That leads us to potentially just using the oaktreemodel protobuf
> >> objects as the basis for the in-memory resource objects inside of
> >> sdk/shade - but that's inception-y enough that we should just skip it
> >> for now. If protobuf->json translations are what's killing us, that's a
> >> great problem to have.
> >>
> >> Timetable
> >> =========
> >>
> >> I think we should aim for having something that's usable/discussable
> for
> >> the single/trusted-user use case for real work (install an oaktree
> >> yourself pointed at a clouds.yaml file and talk to it locally without
> >> auth) by the Dublin PTG. It doesn't have to do everything, but we
> should
> >> at least have a sense of whether this will solve the needs of the
> people
> >> who were interested in this topic so that we'll know whether figuring
> >> out the auth story is worth-while or if this is all a terrible idea.
> >>
> >> I think it's TOTALLY reasonable that by Vancouver we should have a
> thing
> >> that's legit usable for folks who have the pain point today (given the
> >> auth constraint)
> >>
> >> If that works out, discuss auth in Vancouver and aim to have it figured
> >> out and implemented by Berlin so that we can actually start pushing
> >> clouds to include oaktree in their deployments.
> >>
> >> Conclusion
> >> ==========
> >>
> >> Ok. That's the braindump from me. Let me know if you wanna dive in,
> >> we'll get a core team fleshed out and an IRC meeting set up and folks
> >> can start cranking.
> >>
> >> Thanks!
> >> Monty
> >>
> >> [0] http://git.openstack.org/cgit/openstack/oaktree
> >> [1] http://git.openstack.org/cgit/openstack/oaktreemodel
> >> [2] https://storyboard.openstack.org/#!/project/855
> >> [3] https://storyboard.openstack.org/#!/project/856
> >> [4] https://review.openstack.org/#/c/512561/
> >> [5] https://review.openstack.org/#/c/492531/
> >>
> >>
> __________________________________________________________________________
> >>
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> >> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list