Open Stack

Fri May 19 21:27:19 UTC 2017

On 5/19/2017 3:03 PM, Monty Taylor wrote:
> On 05/19/2017 01:04 PM, Sean Dague wrote:
>> On 05/19/2017 01:38 PM, Dean Troyer wrote:
>>> On Fri, May 19, 2017 at 11:53 AM, Chris Friesen
>>> <chris.friesen at windriver.com> wrote:
>>>> ..., but it seems to me that the logical
>>>> extension of that is to expose simple orthogonal APIs where the nova 
>>>> boot
>>>> request should only take neutron port ids and cinder volume ids.  
>>>> The actual
>>>> setup of those ports/volumes would be done by neutron and cinder.
>>>>
>>>> It seems somewhat arbitrary to say "for historical reasons this 
>>>> subset of
>>>> simple things can be done directly in a nova boot command, but for more
>>>> complicated stuff you have to go use these other commands".  I think 
>>>> there's
>>>> an argument to be made that it would be better to be consistent even 
>>>> for the
>>>> simple things.
>>>
>>> cdent mentioned enamel[0] above, and there is also oaktree[1], both of
>>> which are wrapper/proxy services in front of existing OpenStack APIs.
>>> I don't know enough about enamel yet, but one of the things I like
>>> about oaktree is that it is not required to be deployed by the cloud
>>> operator to be useful, I could set it up and proxy Rax and/or
>>> CityCloud and/or mtreinish's closet cloud equally well.
>>>
>>> The fact that these exist, and things like shade itself, are clues
>>> that we are not meeting the needs of API consumers.  I don't think
>>> anyone disagrees with that; let me know if you do and I'll update my
>>> thoughts.
>>
>> It's fine to have other ways to consume things. I feel like "make
>> OpenStack easier to use by requiring you install a client side API
>> server for your own requests" misses the point of the easier part. It's
>> cool you can do it as a power user. It's cool things like Heat exist for
>> people that don't want to write API calls (and just do templates). But
>> it's also not helping on the number of pieces of complexity to manage in
>> OpenStack to have a workable cloud.
> 
> Yup. Agree. Making forward progress on that is paramount.
> 
>> I consider those things duct tape, leading use to the eventually
>> consistent place where we actually do that work internally. Because,
>> having seen with the ec2-api proxy, the moment you get beyond trivial
>> mapping, you now end up with a complex state tracking system, that's
>> going to need to be highly available, and replicate a bunch of your data
>> to be performent, and then have inconsistency issues, because a user
>> deployed API proxy can't have access to the notification bus, and... 
>> boom.
> 
> You can actually get fairly far (with a few notable exceptions - I'm 
> looking at you unattached floating ips) without state tracking. It comes 
> at the cost of more API spidering after a failure/restart. Being able to 
> cache stuff aggressively combined with batching/rate-limiting of 
> requests to the cloud API allows one to do most of this to a fairly 
> massive scale statelessly. However, caching, batching and rate-limiting 
> are all pretty much required else you wind up crashing public clouds. :)
> 
> I agree that the things are currently duct tape, but I don't think that 
> has to be a bad thing. The duct tape is currently needed client-side no 
> matter what we do, and will be for some time no matter what we do 
> because of older clouds. What's often missing is closing the loop so 
> that we can, as OpenStack, eventually provide out of the box the consume 
> experience that people currently get from using one of the client-side 
> duct tapes. That part is harder, but it's definitely important.
> 
>> You end up replicating the Ceilometer issue where there was a break down
>> in getting needs expressed / implemented, and the result was a service
>> doing heavy polling of other APIs (because that's the only way it could
>> get the data it needed). Literally increasing the load on the API
>> surfaces by a factor of 10
> 
> Right. This is why communication is essential. I'm optimistic we can do 
> well on this topic, because we are MUCH better are talking to each other 
> now than we were back when ceilometer was started.
> 
> Also, a REST-consuming porcelain like oaktree gets to draw on real-world 
> experience consuming OpenStack's REST APIs at scale. So it's also not 
> the same problem setup, since it's not a from-scratch new thing.
> 
> This is, incidentally, why experience with caching and batching is 
> important. There is a reason why we do GET /servers/detail once every 5 
> seconds rather than doing a specific GET /server/{id}/detail calls for 
> each booting VM.
> 
> Look at what we could learn just from that... Users using shade are 
> doing a full detailed server list because it scales better for 
> concurrency. It's obviously more expensive on a single-call basis. BUT - 
> maybe it's useful information that doing optimization work on GET 
> /servers/detail could be beneficial.

This reminds me that I suspect we're lazy-loading server detail 
information in certain cases, i.e. going back to the DB to do a join 
per-instance after we've already pulled all instances in an initial set 
(with some initial joins). I need to pull this thread again...

> 
>> (http://superuser.openstack.org/articles/cern-cloud-architecture-update/
>> last graph). That was an anti pattern. We should have gotten to the
>> bottom of the mismatches and communication issues early on, because the
>> end state we all inflicted on users to get a totally reasonable set of
>> features, was not good. Please lets not do this again.
> 
> ++
> 
>> These should be used as ways to experiment with the kinds of interfaces
>> we want cheaply, then take them back into services (which is a more
>> expensive process involving compatibility stories, deeper documentation,
>> performance implications, and the like), not an end game on their own.
> 
> Yup. Amongst other things this is why I'm pushing so hard on service and 
> version discovery right now. I can do it fully and correctly in shade- 
> but then what about non-shade users? We've still got probably a 2 year 
> path to the point where the client logic doesn't need to be duplicated 
> everywhere ... but working on that path is super important.
> 
>>> First and foremost, we need to have the primitive operations that get
>>> composed into the higher-level ones available.  Just picking "POST
>>> /server" as an example, we do not have that today.  Chris mentions
>>> above the low-level version should take IDs for all of the associated
>>> resources and no magic happening behind the scenes.  I think this
>>> should be our top priority, everything else builds on top of that, via
>>> either in-service APIs or proxies or library wrappers, whatever a) can
>>> get implemented and b) makes sense for the use case.
>>
>> You can get the behavior. It also has other behaviors. I'm not sure any
>> user has actually argued for "please make me do more rest calls to
>> create a server".
> 
> No. Please no. I'd like to make less calls.
> 
>> Anyway, this gets pretty meta pretty fast. I agree with Zane saying "I
>> want my server to build", or "I'd like Nova to build a volume for me"
>> are very odd things to call PaaS. I think of PaaS as "here is a ruby on
>> rails app, provision me a db for it, and make it go". Heroku style.
> 
> I'd like for POST /servers to support an auto_ip flag like shade gives 
> users now. Then, when it does, I'd love to have shade use it when it's 
> there, and fall back to making the 20 billion API calls it takes when 
> it's not. get-me-a-network is similar, although in this case nova and 
> neutron got to it before shade did. So the shade get-me-a-network plan 
> is going to be to add the flag to create_server, but have a fallback 
> impl that will make all of the neutron calls itself if the nova in 
> question doesn't support get-me-a-network itself.
> 
> I mention those for a couple of reasons:
> 
> I agree that I think there are some additional 'orchestration' seeming 
> things that putting in nova/neutron/cinder, etc would cause the world to 
> be a better place
> 
> I think it shows a potentially nice pattern - which is a "time machine" 
> like function. When there is a thing that can be used by an API consumer 
> to get up to date semantics on _their_ timeframe, it gives us a vehicle 
> to deliver improvements to OpenStack, but to also empower consumers of 
> older clouds.
> 
> The spec on service-types-authority and aliases is an example of this 
> working as I think should work moving forward. Rather than just 
> implementing that in shade and being done with it - we're working to 
> define the behavior we need in the future. Then we can implement it in 
> shade, and because it's well defined so can other client libs. At a 
> point in the future once all of the OpenStack clouds out there have 
> adopted the recommendations, all of the clients will be doing less work, 
> but they also get their users to the future today.
> 
> A porcelain that is only an API consumer (and doesn't touch the message 
> bus) like oaktree just extends that idea to people who are not python 
> programmers. It also explicitly embodies the time-machine idea.
> 
> As developers working on OpenStack, there is a timeline. It includes 
> things like deprecation notices and removals. From an OpenStack service 
> developer POV the removal of nova-network API support from 
> python-novaclient makes perfect sense, because nova knows that the Pike 
> version of Nova doesn't contain nova-network. Nova can also know that 
> the Pike version of Cinder _definitely_ contains cinder v3. Pike Nova 
> needs to worry about Ocata Cinder for upgrades, but Pike Nova is 
> definitely not worried about Kilo Cinder. Why would it?

nova-network still exists in Pike and will continue to exist until we 
can get rid of cells v1, which isn't until we can claim support for 
multiple cells v2 cells. Just FYI.

As for Cinder, you'd be surprised how many times we've said, "but what 
happens when we're talking to Cinder from Havana?". So that does 
actually come up a lot in discussions about changes involving external 
services. This is why microversions are nice, so nova is the client and 
can know what it's talking to.

> 
> But as developers writing libraries and tools to _consume_ OpenStack, 
> there is no such timeline. Diablo is as alive and well as Pike and must 
> be taken in to account. Our friends at Oracle are running Kilo - we 
> don't want the OpenStack Ansible modules to just stop working for them, 
> do we? Of course not.
> 
> This is why I think an API-consumer porcelain can benefit our users AND 
> can benefit us.
> 
> A deployer can deploy one directly and expose it to their users. Since 
> it's explicitly written to work with old OpenStacks, they can even 
> update it frequently without updating the rest of their cloud. There is 
> a benefit to the deployer of running one because they can ensure the 
> caching and rate-limiting settings are such that they don't get pummeled 
> as they might otherwise get his if they had 1000 users each running 
> their own.
> 
> If a deployer doesn't, the user is empowered to time-machine themselves 
> if they want to. It might be slightly less efficient - and a bit 
> annoying since they have to run a service, but it'll work if they happen 
> to be in a scenario where they need to use an older cloud and they don't 
> happen to be a python developer.
> 
> It benefits us because we can continue to work on business-logic 
> patterns in the time-machine, then continue to learn which of them are 
> things we should push down into the service.
> 
> Like maybe we wind up driving more Zaqar adoption so that we've got an 
> answer for waiting on a server to be done without polling. Or maybe we 
> realize there is a new abstraction that neutron needs called "An IP 
> Address" - that a user can request and they'll get a floating IP on FIP 
> clouds and a disconnected port on clouds where that's a possiblity. And 
> then a user can have the same workflow for creating a long-lived IP, 
> booting a server and attaching the IP regardless of whether the 
> implementation of that IP is NAT or not.
> 
> We also have a vehicle to explore entirely new concepts by using gRPC 
> and not REST. Active return channels in gRPC that come from http/2 open 
> up some _great_ possibilities. Registering callbacks or waiting for 
> notifications is just built-in to gRPC in the first place. A person 
> writing an application talking to oaktree gets to just say "boot me a 
> server and I'm going to wait until it's done" - and there is no 
> client-side polling. Of course, oaktree will be polling OpenStack - but 
> it'll be doing it the efficient way that almost nobody writing their own 
> polling code is going to bother doing but that we did once and now it 
> can be used. But maybe we learn that service-to-service communication 
> would be improved by a gRPC layer. Or maybe we don't.
> 
> Adding a new API feature at the shade/oaktree layer is trivial - at 
> least compared to the cost of adding an API feature like 
> get-me-a-network to nova and neutron. If it turns out to be a dumb idea, 
> keeping it until the end of time is low cost. But if turns out to be 
> awesome, then we know that figuring out how to push it down into the 
> REST layer is worthwhile.
> 
> Anyway - those are reasons I think a porcelain layer isn't necessarily a 
> bad idea. I think there's real potential end-user value. I also don't 
> think it's the solution to all of life's problems. If the underlying 
> REST APIs don't evolve over time to reduce the amount of client logic 
> employed on a fresh new and modern cloud then we're ultimately not 
> progressing, existence of a porcelain layer or not.
> 
> Monty
> 
> PS. I spoke of oaktree in the present tense, as it is a thing. But in 
> case you haven't been following that and now you're like "oh boy, 
> another thing I need to pay attention to!"
> 
> We have made a lot of progress in the last six months, but you will not 
> notice any of it in the oaktree repos. That is because as we started 
> cranking on oaktree in earnest we realized there were two issues in 
> shade that needed to be fixed before further work in the oaktree repos 
> would be efficient. The first was that, although shade has an API in 
> terms of the methods it exposes - the data returned from the API was 
> mushy (protobuf likes mushy data much less than python). So we 
> implemented data model contracts and docs for them. The other was that 
> the python-*client depends in shade hit the tipping point where we 
> needed to get rid of them more urgently. It's not an "easy for a user to 
> install locally" if it depends on the entire phone book. We're very 
> close to being done with replacing shade's use of python-*client with 
> direct REST calls via keystoneauth. Once we're done with that, we can 
> once again turn attention to making evident forward progress on oaktree 
> itself. (I've been pointing people who want to help with oaktree to 
> finishing the RESTification)
> 
> Which is all to say, the project is still very much a thing. Once 
> RESTification is done, I'll follow up with folks on where we are and 
> what's next for it.
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 

Thanks,

Matt

Open Stack

[openstack-dev] Is the pendulum swinging on PaaS layers?

OpenStack

Community

Documentation

Branding & Legal