[openstack-dev] Is the pendulum swinging on PaaS layers?

Monty Taylor mordred at inaugust.com
Sat May 20 16:37:02 UTC 2017


On 05/19/2017 04:27 PM, Matt Riedemann wrote:
> On 5/19/2017 3:03 PM, Monty Taylor wrote:
>> On 05/19/2017 01:04 PM, Sean Dague wrote:
>>> On 05/19/2017 01:38 PM, Dean Troyer wrote:
>>>> On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
>>>> <chris.friesen at windriver.com> wrote:
>>>>> ..., but it seems to me that the logical
>>>>> extension of that is to expose simple orthogonal APIs where the
>>>>> nova boot
>>>>> request should only take neutron port ids and cinder volume ids.
>>>>> The actual
>>>>> setup of those ports/volumes would be done by neutron and cinder.
>>>>>
>>>>> It seems somewhat arbitrary to say "for historical reasons this
>>>>> subset of
>>>>> simple things can be done directly in a nova boot command, but for
>>>>> more
>>>>> complicated stuff you have to go use these other commands".  I
>>>>> think there's
>>>>> an argument to be made that it would be better to be consistent
>>>>> even for the
>>>>> simple things.
>>>>
>>>> cdent mentioned enamel[0] above, and there is also oaktree[1], both of
>>>> which are wrapper/proxy services in front of existing OpenStack APIs.
>>>> I don't know enough about enamel yet, but one of the things I like
>>>> about oaktree is that it is not required to be deployed by the cloud
>>>> operator to be useful, I could set it up and proxy Rax and/or
>>>> CityCloud and/or mtreinish's closet cloud equally well.
>>>>
>>>> The fact that these exist, and things like shade itself, are clues
>>>> that we are not meeting the needs of API consumers.  I don't think
>>>> anyone disagrees with that; let me know if you do and I'll update my
>>>> thoughts.
>>>
>>> It's fine to have other ways to consume things. I feel like "make
>>> OpenStack easier to use by requiring you install a client side API
>>> server for your own requests" misses the point of the easier part. It's
>>> cool you can do it as a power user. It's cool things like Heat exist for
>>> people that don't want to write API calls (and just do templates). But
>>> it's also not helping on the number of pieces of complexity to manage in
>>> OpenStack to have a workable cloud.
>>
>> Yup. Agree. Making forward progress on that is paramount.
>>
>>> I consider those things duct tape, leading use to the eventually
>>> consistent place where we actually do that work internally. Because,
>>> having seen with the ec2-api proxy, the moment you get beyond trivial
>>> mapping, you now end up with a complex state tracking system, that's
>>> going to need to be highly available, and replicate a bunch of your data
>>> to be performent, and then have inconsistency issues, because a user
>>> deployed API proxy can't have access to the notification bus, and...
>>> boom.
>>
>> You can actually get fairly far (with a few notable exceptions - I'm
>> looking at you unattached floating ips) without state tracking. It
>> comes at the cost of more API spidering after a failure/restart. Being
>> able to cache stuff aggressively combined with batching/rate-limiting
>> of requests to the cloud API allows one to do most of this to a fairly
>> massive scale statelessly. However, caching, batching and
>> rate-limiting are all pretty much required else you wind up crashing
>> public clouds. :)
>>
>> I agree that the things are currently duct tape, but I don't think
>> that has to be a bad thing. The duct tape is currently needed
>> client-side no matter what we do, and will be for some time no matter
>> what we do because of older clouds. What's often missing is closing
>> the loop so that we can, as OpenStack, eventually provide out of the
>> box the consume experience that people currently get from using one of
>> the client-side duct tapes. That part is harder, but it's definitely
>> important.
>>
>>> You end up replicating the Ceilometer issue where there was a break down
>>> in getting needs expressed / implemented, and the result was a service
>>> doing heavy polling of other APIs (because that's the only way it could
>>> get the data it needed). Literally increasing the load on the API
>>> surfaces by a factor of 10
>>
>> Right. This is why communication is essential. I'm optimistic we can
>> do well on this topic, because we are MUCH better are talking to each
>> other now than we were back when ceilometer was started.
>>
>> Also, a REST-consuming porcelain like oaktree gets to draw on
>> real-world experience consuming OpenStack's REST APIs at scale. So
>> it's also not the same problem setup, since it's not a from-scratch
>> new thing.
>>
>> This is, incidentally, why experience with caching and batching is
>> important. There is a reason why we do GET /servers/detail once every
>> 5 seconds rather than doing a specific GET /server/{id}/detail calls
>> for each booting VM.
>>
>> Look at what we could learn just from that... Users using shade are
>> doing a full detailed server list because it scales better for
>> concurrency. It's obviously more expensive on a single-call basis. BUT
>> - maybe it's useful information that doing optimization work on GET
>> /servers/detail could be beneficial.
>
> This reminds me that I suspect we're lazy-loading server detail
> information in certain cases, i.e. going back to the DB to do a join
> per-instance after we've already pulled all instances in an initial set
> (with some initial joins). I need to pull this thread again...
>
>>
>>> (http://superuser.openstack.org/articles/cern-cloud-architecture-update/
>>> last graph). That was an anti pattern. We should have gotten to the
>>> bottom of the mismatches and communication issues early on, because the
>>> end state we all inflicted on users to get a totally reasonable set of
>>> features, was not good. Please lets not do this again.
>>
>> ++
>>
>>> These should be used as ways to experiment with the kinds of interfaces
>>> we want cheaply, then take them back into services (which is a more
>>> expensive process involving compatibility stories, deeper documentation,
>>> performance implications, and the like), not an end game on their own.
>>
>> Yup. Amongst other things this is why I'm pushing so hard on service
>> and version discovery right now. I can do it fully and correctly in
>> shade- but then what about non-shade users? We've still got probably a
>> 2 year path to the point where the client logic doesn't need to be
>> duplicated everywhere ... but working on that path is super important.
>>
>>>> First and foremost, we need to have the primitive operations that get
>>>> composed into the higher-level ones available.  Just picking "POST
>>>> /server" as an example, we do not have that today.  Chris mentions
>>>> above the low-level version should take IDs for all of the associated
>>>> resources and no magic happening behind the scenes.  I think this
>>>> should be our top priority, everything else builds on top of that, via
>>>> either in-service APIs or proxies or library wrappers, whatever a) can
>>>> get implemented and b) makes sense for the use case.
>>>
>>> You can get the behavior. It also has other behaviors. I'm not sure any
>>> user has actually argued for "please make me do more rest calls to
>>> create a server".
>>
>> No. Please no. I'd like to make less calls.
>>
>>> Anyway, this gets pretty meta pretty fast. I agree with Zane saying "I
>>> want my server to build", or "I'd like Nova to build a volume for me"
>>> are very odd things to call PaaS. I think of PaaS as "here is a ruby on
>>> rails app, provision me a db for it, and make it go". Heroku style.
>>
>> I'd like for POST /servers to support an auto_ip flag like shade gives
>> users now. Then, when it does, I'd love to have shade use it when it's
>> there, and fall back to making the 20 billion API calls it takes when
>> it's not. get-me-a-network is similar, although in this case nova and
>> neutron got to it before shade did. So the shade get-me-a-network plan
>> is going to be to add the flag to create_server, but have a fallback
>> impl that will make all of the neutron calls itself if the nova in
>> question doesn't support get-me-a-network itself.
>>
>> I mention those for a couple of reasons:
>>
>> I agree that I think there are some additional 'orchestration' seeming
>> things that putting in nova/neutron/cinder, etc would cause the world
>> to be a better place
>>
>> I think it shows a potentially nice pattern - which is a "time
>> machine" like function. When there is a thing that can be used by an
>> API consumer to get up to date semantics on _their_ timeframe, it
>> gives us a vehicle to deliver improvements to OpenStack, but to also
>> empower consumers of older clouds.
>>
>> The spec on service-types-authority and aliases is an example of this
>> working as I think should work moving forward. Rather than just
>> implementing that in shade and being done with it - we're working to
>> define the behavior we need in the future. Then we can implement it in
>> shade, and because it's well defined so can other client libs. At a
>> point in the future once all of the OpenStack clouds out there have
>> adopted the recommendations, all of the clients will be doing less
>> work, but they also get their users to the future today.
>>
>> A porcelain that is only an API consumer (and doesn't touch the
>> message bus) like oaktree just extends that idea to people who are not
>> python programmers. It also explicitly embodies the time-machine idea.
>>
>> As developers working on OpenStack, there is a timeline. It includes
>> things like deprecation notices and removals. From an OpenStack
>> service developer POV the removal of nova-network API support from
>> python-novaclient makes perfect sense, because nova knows that the
>> Pike version of Nova doesn't contain nova-network. Nova can also know
>> that the Pike version of Cinder _definitely_ contains cinder v3. Pike
>> Nova needs to worry about Ocata Cinder for upgrades, but Pike Nova is
>> definitely not worried about Kilo Cinder. Why would it?
>
> nova-network still exists in Pike and will continue to exist until we
> can get rid of cells v1, which isn't until we can claim support for
> multiple cells v2 cells. Just FYI.

OH! Sorry - the thing I really meant was the proxy APIs. (I confuse 
those and some aspects of nova-network in my head a lot)

> As for Cinder, you'd be surprised how many times we've said, "but what
> happens when we're talking to Cinder from Havana?". So that does
> actually come up a lot in discussions about changes involving external
> services. This is why microversions are nice, so nova is the client and
> can know what it's talking to.

Neat! That's good to know - it means there's more to directly 
collaborate on in this space than I might have otherwise expected. Yay.

>>
>> But as developers writing libraries and tools to _consume_ OpenStack,
>> there is no such timeline. Diablo is as alive and well as Pike and
>> must be taken in to account. Our friends at Oracle are running Kilo -
>> we don't want the OpenStack Ansible modules to just stop working for
>> them, do we? Of course not.
>>
>> This is why I think an API-consumer porcelain can benefit our users
>> AND can benefit us.
>>
>> A deployer can deploy one directly and expose it to their users. Since
>> it's explicitly written to work with old OpenStacks, they can even
>> update it frequently without updating the rest of their cloud. There
>> is a benefit to the deployer of running one because they can ensure
>> the caching and rate-limiting settings are such that they don't get
>> pummeled as they might otherwise get his if they had 1000 users each
>> running their own.
>>
>> If a deployer doesn't, the user is empowered to time-machine
>> themselves if they want to. It might be slightly less efficient - and
>> a bit annoying since they have to run a service, but it'll work if
>> they happen to be in a scenario where they need to use an older cloud
>> and they don't happen to be a python developer.
>>
>> It benefits us because we can continue to work on business-logic
>> patterns in the time-machine, then continue to learn which of them are
>> things we should push down into the service.
>>
>> Like maybe we wind up driving more Zaqar adoption so that we've got an
>> answer for waiting on a server to be done without polling. Or maybe we
>> realize there is a new abstraction that neutron needs called "An IP
>> Address" - that a user can request and they'll get a floating IP on
>> FIP clouds and a disconnected port on clouds where that's a
>> possiblity. And then a user can have the same workflow for creating a
>> long-lived IP, booting a server and attaching the IP regardless of
>> whether the implementation of that IP is NAT or not.
>>
>> We also have a vehicle to explore entirely new concepts by using gRPC
>> and not REST. Active return channels in gRPC that come from http/2
>> open up some _great_ possibilities. Registering callbacks or waiting
>> for notifications is just built-in to gRPC in the first place. A
>> person writing an application talking to oaktree gets to just say
>> "boot me a server and I'm going to wait until it's done" - and there
>> is no client-side polling. Of course, oaktree will be polling
>> OpenStack - but it'll be doing it the efficient way that almost nobody
>> writing their own polling code is going to bother doing but that we
>> did once and now it can be used. But maybe we learn that
>> service-to-service communication would be improved by a gRPC layer. Or
>> maybe we don't.
>>
>> Adding a new API feature at the shade/oaktree layer is trivial - at
>> least compared to the cost of adding an API feature like
>> get-me-a-network to nova and neutron. If it turns out to be a dumb
>> idea, keeping it until the end of time is low cost. But if turns out
>> to be awesome, then we know that figuring out how to push it down into
>> the REST layer is worthwhile.
>>
>> Anyway - those are reasons I think a porcelain layer isn't necessarily
>> a bad idea. I think there's real potential end-user value. I also
>> don't think it's the solution to all of life's problems. If the
>> underlying REST APIs don't evolve over time to reduce the amount of
>> client logic employed on a fresh new and modern cloud then we're
>> ultimately not progressing, existence of a porcelain layer or not.
>>
>> Monty
>>
>> PS. I spoke of oaktree in the present tense, as it is a thing. But in
>> case you haven't been following that and now you're like "oh boy,
>> another thing I need to pay attention to!"
>>
>> We have made a lot of progress in the last six months, but you will
>> not notice any of it in the oaktree repos. That is because as we
>> started cranking on oaktree in earnest we realized there were two
>> issues in shade that needed to be fixed before further work in the
>> oaktree repos would be efficient. The first was that, although shade
>> has an API in terms of the methods it exposes - the data returned from
>> the API was mushy (protobuf likes mushy data much less than python).
>> So we implemented data model contracts and docs for them. The other
>> was that the python-*client depends in shade hit the tipping point
>> where we needed to get rid of them more urgently. It's not an "easy
>> for a user to install locally" if it depends on the entire phone book.
>> We're very close to being done with replacing shade's use of
>> python-*client with direct REST calls via keystoneauth. Once we're
>> done with that, we can once again turn attention to making evident
>> forward progress on oaktree itself. (I've been pointing people who
>> want to help with oaktree to finishing the RESTification)
>>
>> Which is all to say, the project is still very much a thing. Once
>> RESTification is done, I'll follow up with folks on where we are and
>> what's next for it.
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>




More information about the OpenStack-dev mailing list