[openstack-dev] Is the pendulum swinging on PaaS layers?

Monty Taylor mordred at inaugust.com
Fri May 19 20:35:17 UTC 2017


On 05/19/2017 03:05 PM, Zane Bitter wrote:
> On 19/05/17 15:06, Kevin Benton wrote:
>>> Don't even get me started on Neutron.[2]
>>
>> It seems to me the conclusion to that thread was that the majority of
>> your issues stemmed from the fact that we had poor documentation at the
>> time.  A major component of the complaints resulted from you
>> misunderstanding the difference between networks/subnets in Neutron.
>
> It's true that I was completely off base as to what the various
> primitives in Neutron actually do. (Thanks for educating me!) The
> implications for orchestration are largely unchanged though. It's a
> giant pain that we have to infer implicit dependencies between stuff to
> get them to create/delete in the right order, pretty much independently
> of what that stuff does.
>
> So knowing now that a Network is a layer-2 network segment and a Subnet
> is... effectively a glorified DHCP address pool, I understand better why
> it probably seemed like a good idea to hook stuff up magically. But at
> the end of the day, I still can't create a Port until a Subnet exists, I
> still don't know what Subnet a Port will be attached to (unless the user
> specifies it explicitly using the --fixed-ip option... regardless of
> whether they actually specify a fixed IP), and I have no way in general
> of telling which Subnets can be deleted before a given Port is and which
> will fail to delete until the Port disappears.
>
>> There are some legitimate issues in there about the extra routes
>> extension being replace-only and the routers API not accepting a list of
>> interfaces in POST.  However, it hardly seems that those are worthy of
>> "Don't even get me started on Neutron."
>
> https://launchpad.net/bugs/1626607
> https://launchpad.net/bugs/1442121
> https://launchpad.net/bugs/1626619
> https://launchpad.net/bugs/1626630
> https://launchpad.net/bugs/1626634
>
>> It would be nice if you could write up something about current gaps that
>> would make Heat's life easier, because a large chunk of that initial
>> email is incorrect and linking to it as a big list of "issues" is
>> counter-productive.

I used to have angst at the Neutron API but have come to like it more 
and more over time.

I think the main thing I run in to is that Neutron's API is modelling a 
a pile of data to allow for power users to do very flexible things. What 
it's missing most of the time is an easy button.

I'll give some examples:

My favorite for-instance, which I mentioned in a different thread this 
week and have mentioned in almost every talk I've given over the last 3 
years - is that there is no way to find out if a given network can 
provide connectivity to a resource from outside of the cloud.

There are _many_ reasons why it's hard to fully express a completely 
accurate answer to this problem. "What does external mean" "what if 
there are multiple external networks" etc. Those are all valid, and all 
speak to real workloads and real user scenarios ...

But there's also:

As a user I want to boot a VM on this cloud and have my users who are 
not necessarily on this cloud be able to connect a service I'm going to 
run on it. (aka, I want to run a wordpress)

and

As a user I want to boot a VM on this cloud and I do not want anyone who 
is not another resource on this cloud to be able to connect to anything 
it's running. ( aka, I want to run a mysql)

Unless you know things about the cloud already somehow not from the API, 
it is impossible to consistently perform those two tasks.

We've done a great job empowering the power users to do a bunch of 
really cool things. But we missed booting a wordpress as a basic use case.

Other things exist but aren't anyone's fault really. We still can't as a 
community agree on a consistent worldview related to fixed ips, neutron 
ports and floating ips. Neutron amazingly supports ALL of the use case 
combinations for those topics ... it just doesn't always do so in all of 
the clouds.

Heck - while I'm on floating ips ... if you have some pre-existing 
floating ips and you want to boot servers on them and you want to do 
that in parallel, you can't. You can boot a server with a floating ip 
that did not pre-exist if you get the port id of the fixed ip of the 
server then pass that id to the floating ip create call. Of course, the 
server doesn't return the port id in the server record, so at the very 
least you need to make a GET /ports.json?device_id={server_id} call. Of 
course what you REALLY need to find is the port_id of the ip of the 
server that came from a subnet that has 'gateway_ip' defined, which is 
even more fun since ips are associated with _networks_ on the server 
record and not with subnets.

Possibly to Zane's point, you basically have to recreate a multi-table 
data model client side and introspect relationships between objects to 
be able to figure out how to correctly get a floating ip on to a server. 
NOW - as opposed to the external network bit- it IS possible to do and 
to do correctly and have it work every time.

But if you want to re-use an existing floating ip you either have to 
keep a client-side database of them where you can allocate one to a 
server, or you just have to do a try/fail/try/fail loop because the only 
way you can claim one is to just try to attach it.

In any case - I apologize that I have not been able to more crisply 
describe these issues such that people can deal with them. I truly 
believe there is a point-of-view issue and the conversations can fail 
from the consume side and the produce side having different context. I 
think we made several big steps forward related to keystone at the 
Bostom Summit. Maybe next time we should try to do a similar thing for 
nova/neutron?

> Yes, agreed. I wish I had a clean thread to link to. It's a huge amount
> of work to research it all though.
>
> cheers,
> Zane.
>
>> On Fri, May 19, 2017 at 7:36 AM, Zane Bitter <zbitter at redhat.com
>> <mailto:zbitter at redhat.com>> wrote:
>>
>>     On 18/05/17 20:19, Matt Riedemann wrote:
>>
>>         I just wanted to blurt this out since it hit me a few times at
>> the
>>         summit, and see if I'm misreading the rooms.
>>
>>         For the last few years, Nova has pushed back on adding
>>         orchestration to
>>         the compute API, and even define a policy for it since it comes
>>         up so
>>         much [1]. The stance is that the compute API should expose
>>         capabilities
>>         that a higher-level orchestration service can stitch together
>>         for a more
>>         fluid end user experience.
>>
>>
>>     I think this is a wise policy.
>>
>>         One simple example that comes up time and again is allowing a
>>         user to
>>         pass volume type to the compute API when booting from volume
>>         such that
>>         when nova creates the backing volume in Cinder, it passes
>>         through the
>>         volume type. If you need a non-default volume type for boot from
>>         volume,
>>         the way you do this today is first create the volume with said
>>         type in
>>         Cinder and then provide that volume to the compute API when
>>         creating the
>>         server. However, people claim that is bad UX or hard for users to
>>         understand, something like that (at least from a command line, I
>>         assume
>>         Horizon hides this, and basic users should probably be using
>> Horizon
>>         anyway right?).
>>
>>
>>     As always, there's a trade-off between simplicity and flexibility. I
>>     can certainly understand the logic in wanting to make the simple
>>     stuff simple. But users also need to be able to progress from simple
>>     stuff to more complex stuff without having to give up and start
>>     over. There's a danger of leading them down the garden path.
>>
>>         While talking about claims in the scheduler and a top-level
>>         conductor
>>         for cells v2 deployments, we've talked about the desire to
>> eliminate
>>         "up-calls" from the compute service to the top-level controller
>>         services
>>         (nova-api, nova-conductor and nova-scheduler). Build retries is
>>         one such
>>         up-call. CERN disables build retries, but others rely on them,
>>         because
>>         of how racy claims in the computes are (that's another story
>> and why
>>         we're working on fixing it). While talking about this, we asked,
>>         "why
>>         not just do away with build retries in nova altogether? If the
>>         scheduler
>>         picks a host and the build fails, it fails, and you have to
>>         retry/rebuild/delete/recreate from a top-level service."
>>
>>
>>     (FWIW Heat does this for you already.)
>>
>>         But during several different Forum sessions, like user API
>>         improvements
>>         [2] but also the cells v2 and claims in the scheduler sessions,
>>         I was
>>         hearing about how operators only wanted to expose the base IaaS
>>         services
>>         and APIs and end API users wanted to only use those, which
>> means any
>>         improvements in those APIs would have to be in the base APIs
>> (nova,
>>         cinder, etc). To me, that generally means any orchestration
>>         would have
>>         to be baked into the compute API if you're not using Heat or
>>         something
>>         similar.
>>
>>
>>     The problem is that orchestration done inside APIs is very easy to
>>     do badly in ways that cause lots of downstream pain for users and
>>     external orchestrators. For example, Nova already does some
>>     orchestration: it creates a Neutron port for a server if you don't
>>     specify one. (And then promptly forgets that it has done so.) There
>>     is literally an entire inner platform, an orchestrator within an
>>     orchestrator, inside Heat to try to manage the fallout from this.
>>     And the inner platform shares none of the elegance, such as it is,
>>     of Heat itself, but is rather a collection of cobbled-together hacks
>>     to deal with the seemingly infinite explosion of edge cases that we
>>     kept running into over a period of at least 5 releases.
>>
>>     The get-me-a-network thing is... better, but there's no provision
>>     for changes after the server is created, which means we have to
>>     copy-paste the Nova implementation into Heat to deal with update.[1]
>>     Which sounds like a maintenance nightmare in the making. That seems
>>     to be a common mistake: to assume that once users create something
>>     they'll never need to touch it again, except to delete it when
>>     they're done.
>>
>>     Don't even get me started on Neutron.[2]
>>
>>     Any orchestration that is done behind-the-scenes needs to be done
>>     superbly well, provide transparency for external orchestration tools
>>     that need to hook in to the data flow, and should be developed in
>>     consultation with potential consumers like Shade and Heat.
>>
>>         Am I missing the point, or is the pendulum really swinging
>> away from
>>         PaaS layer services which abstract the dirty details of the
>>         lower-level
>>         IaaS APIs? Or was this always something people wanted and I've
>> just
>>         never made the connection until now?
>>
>>
>>     (Aside: can we stop using the term 'PaaS' to refer to "everything
>>     that Nova doesn't do"? This habit is not helping us to communicate
>>     clearly.)
>>
>>     cheers,
>>     Zane.
>>
>>     [1] https://review.openstack.org/#/c/407328/
>>     <https://review.openstack.org/#/c/407328/>
>>     [2]
>>
>> http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
>>
>> <http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>
>>
>>
>>
>>
>> __________________________________________________________________________
>>
>>     OpenStack Development Mailing List (not for usage questions)
>>     Unsubscribe:
>>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>
>> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>
>>
>>
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list