[openstack-dev] [neutron][heat] - making Neutron more friendly for orchestration

Zane Bitter zbitter at redhat.com
Tue May 23 18:48:23 UTC 2017


On 19/05/17 19:53, Kevin Benton wrote:
> So making a subnet ID mandatory for a port creation and
> a RouterInterface ID mandatory for a Floating IP creation are both
> possible in Heat without Neutron changes. Presumably you haven't done
> that because it's backwards-incompatible, but you would need to
> implement the change anyway if the Neutron API was changed to require it.
>
> It seems like Heat has a backwards-compatibility requirement for
> supporting old templates that aren't explicit. That will be the real
> blocker to actually making any of these changes, no? i.e. Neutron isn't
> preventing Heat from being more strict, it's the legacy Heat modeling
> that is preventing it.

We have a translation mechanism for resource properties (much improved 
in Pike - thanks prazumovsky!) that could in theory help us to make such 
a change (with or without a corresponding change in the Neutron API) 
without breaking existing users (although it would probably require a 
bunch of expensive API calls at inopportune times). That would likely be 
just as much of a pain to maintain as the workarounds we have now, so 
tbh we're likely to stick with reflecting the Neutron API directly, 
whatever it does.

I've long since chalked this one up to 'lessons learned'; if I keep 
harping on it, it's because I want to make sure that everyone really 
does learn the lessons.

>>(a) drop the requirement that the Network has to be connected to the
> external network with the FloatingIPs with a RouterInterface prior to
> creating the FloatingIP. IIUC only *some* Neutron backends require this.
>
> This can produce difficult to debug situations when multiple routers
> attached to different external networks are attached to different
> subnets of the same network and the user associates a floating IP to the
> wrong fixed IP of the instance. Right now the interface check will
> prevent that, but if we remove it the floating IP would just sit in the
> DOWN state.
>
> If a backend supports floating IPs without router interfaces entirely,
> it's likely making assumptions that prevent it from supporting
> multi-router scenarios. A single fixed IP on a port can have multiple
> floating IPs associated with it from different external networks. So the
> only way to distinguish which floating IP to translate to is which
> router the traffic is being directed to by the instance, which requires
> router interfaces.
>
> Cheers
>
> On Fri, May 19, 2017 at 3:29 PM, Zane Bitter <zbitter at redhat.com
> <mailto:zbitter at redhat.com>> wrote:
>
>     On 19/05/17 17:03, Kevin Benton wrote:
>
>         I split this conversation off of the "Is the pendulum swinging
>         on PaaS
>         layers?" thread [1] to discuss some improvements we can make to
>         Neutron
>         to make orchestration easier.
>
>         There are some pain points that heat has when working with the
>         Neutron
>         API. I would like to get them converted into requests for
>         enhancements
>         in Neutron so the wider community is aware of them.
>
>         Starting with the port/subnet/network relationship - it's
>         important to
>         understand that IP addresses are not required on a port.
>
>             So knowing now that a Network is a layer-2 network segment
>             and a Subnet
>
>         is... effectively a glorified DHCP address pool
>
>         Yes, a Subnet controls IP address allocation as well as setting up
>         routing for routers, which is why routers reference subnets
>         instead of
>         networks (different routers can route for different subnets on
>         the same
>         network). It essentially dictates things related to L3
>         addressing and
>         provides information for L3 reachability.
>
>             But at the end of the day, I still can't create a Port until
>             a Subnet exists
>
>
>         This is only true if you want an IP address on the port. This sounds
>         silly for most use cases, but there are a non-trivial portion of NFV
>         workloads that do not want IP addresses at all so they create a
>         network
>         and just attach ports without creating any subnets.
>
>
>     Fair. A more precise statement of the problem would be that given a
>     template containing both a Port and a Subnet that it will be
>     attached to, there is a specific order in which those need to be
>     created that is _not_ reflected in the data flow between them.
>
>             I still don't know what Subnet a Port will be attached to
>             (unless the
>
>         user specifies it explicitly using the --fixed-ip option...
>         regardless
>         of whether they actually specify a fixed IP),
>
>         So what would you like Neutron to do differently here? Always
>         force a
>         user to pick which subnet they want an allocation from
>
>
>     That would work.
>
>         if there are
>         multiple?
>
>
>     Ideally even if there aren't.
>
>         If so, can't you just force that explicitness in Heat?
>
>
>     I think the answer here is exactly the same as for Neutron: yes, we
>     totally could have if we'd realised that it was a problem at the time.
>
>             and I have no way in general of telling which Subnets can be
>             deleted before a given Port is and which will fail to delete
>             until the Port disappears.
>
>
>         A given port will only block subnet deletions from subnets it is
>         attached to. Conversely, you can see all ports with allocations
>         from a
>         subnet with 'neutron port-list --fixed-ips
>         subnet_id=<subnet-UUID>'.  So
>         is the issue here that the dependency wasn't made explicit in
>         the heat
>         modeling (leading to the problem above and this one)?
>
>
>     Yes, that's exactly the issue. The Heat modelling was based on 1:1
>     with the Neutron API to minimise user confusion.
>
>         For the individual bugs you highlighted, it would be good if you can
>         provide some details about what changes we could make to help.
>
>
>         https://bugs.launchpad.net/heat/+bug/1442121
>         <https://bugs.launchpad.net/heat/+bug/1442121> - This looks like
>         a result
>         of partially specified floating IPs (no fixed_ip). What can we
>         add/change here to help? Or can heat just always force the user to
>         specify a fixed IP for the case where disambiguation on multiple
>         fixed_ip ports is needed?
>
>
>     This is the issue from which all the others on that list were
>     spawned (see
>     https://bugs.launchpad.net/heat/+bug/1442121/comments/10
>     <https://bugs.launchpad.net/heat/+bug/1442121/comments/10>), so the
>     only thing we're planning to actually do for this one is to catch
>     any exceptions closer to where they occur than we're doing in the
>     fix for https://bugs.launchpad.net/heat/+bug/1554625
>     <https://bugs.launchpad.net/heat/+bug/1554625>
>
>         https://launchpad.net/bugs/1626607
>         <https://launchpad.net/bugs/1626607>
>
>
>     Note that this one is fixed.
>
>         - I see this is about a dependency
>         between RouterGateways and RouterInterfaces, but it's not clear
>         to me
>         why that dependency exists. Is it to solve a lack of visibility
>         into the
>         interfaces required for a floating IP?
>
>
>     Yes, exactly.
>
>     We essentially solved the RouterGateway/RouterInterface half of the
>     problem in Heat back in Juno, by deprecating the
>     OS::Neutron::RouterGateway resource and replacing it with an
>     "external_gateway_info" property in OS::Neutron::Router. Old
>     templates never die though.
>
>         https://bugs.launchpad.net/heat/+bug/1626619
>         <https://bugs.launchpad.net/heat/+bug/1626619>,
>         https://bugs.launchpad.net/heat/+bug/1626630
>         <https://bugs.launchpad.net/heat/+bug/1626630>, and
>         https://bugs.launchpad.net/heat/+bug/1626634
>         <https://bugs.launchpad.net/heat/+bug/1626634> - These seems
>         similar to
>         1626607.
>
>
>     The first and third are the RouterInterface/FloatingIP half of the
>     problem. And to work around that we also have to work around the
>     Subnet/Port problem (that's the third bug). The second bug is the
>     RouterGateway/RouterInterface equivalent of the third.
>
>         Can we just expose the interfaces/router a floating IP is
>         depending on explicitly in the API for you to fix these?
>
>
>     Not really. We need to know before any of them are actually created.
>     Preferably without making any REST calls, because REST calls are
>     slow and tend to raise exceptions at unfortunate times.
>
>         If not, what
>         can we do to help here?
>
>
>     In principle, either:
>
>     (a) drop the requirement that the Network has to be connected to the
>     external network with the FloatingIPs with a RouterInterface prior
>     to creating the FloatingIP. IIUC only *some* Neutron backends
>     require this.
>
>     or
>
>     (b) require the user to provide the UUID of the RouterInterface
>     through which they wish to connect when they create the FloatingIP.
>
>     cheers,
>     Zane.
>
>         1.
>         http://lists.openstack.org/pipermail/openstack-dev/2017-May/117106.html
>         <http://lists.openstack.org/pipermail/openstack-dev/2017-May/117106.html>
>
>         Cheers,
>         Kevin Benton
>
>         On Fri, May 19, 2017 at 1:05 PM, Zane Bitter <zbitter at redhat.com
>         <mailto:zbitter at redhat.com>
>         <mailto:zbitter at redhat.com <mailto:zbitter at redhat.com>>> wrote:
>
>             On 19/05/17 15:06, Kevin Benton wrote:
>
>                     Don't even get me started on Neutron.[2]
>
>
>                 It seems to me the conclusion to that thread was that the
>                 majority of
>                 your issues stemmed from the fact that we had poor
>         documentation
>                 at the
>                 time.  A major component of the complaints resulted from you
>                 misunderstanding the difference between networks/subnets
>         in Neutron.
>
>
>             It's true that I was completely off base as to what the various
>             primitives in Neutron actually do. (Thanks for educating
>         me!) The
>             implications for orchestration are largely unchanged though.
>         It's a
>             giant pain that we have to infer implicit dependencies
>         between stuff
>             to get them to create/delete in the right order, pretty much
>             independently of what that stuff does.
>
>             So knowing now that a Network is a layer-2 network segment and a
>             Subnet is... effectively a glorified DHCP address pool, I
>         understand
>             better why it probably seemed like a good idea to hook stuff up
>             magically. But at the end of the day, I still can't create a
>         Port
>             until a Subnet exists, I still don't know what Subnet a Port
>         will be
>             attached to (unless the user specifies it explicitly using the
>             --fixed-ip option... regardless of whether they actually
>         specify a
>             fixed IP), and I have no way in general of telling which
>         Subnets can
>             be deleted before a given Port is and which will fail to delete
>             until the Port disappears.
>
>                 There are some legitimate issues in there about the
>         extra routes
>                 extension being replace-only and the routers API not
>         accepting a
>                 list of
>                 interfaces in POST.  However, it hardly seems that those are
>                 worthy of
>                 "Don't even get me started on Neutron."
>
>
>             https://launchpad.net/bugs/1626607
>         <https://launchpad.net/bugs/1626607>
>         <https://launchpad.net/bugs/1626607
>         <https://launchpad.net/bugs/1626607>>
>             https://launchpad.net/bugs/1442121
>         <https://launchpad.net/bugs/1442121>
>         <https://launchpad.net/bugs/1442121
>         <https://launchpad.net/bugs/1442121>>
>             https://launchpad.net/bugs/1626619
>         <https://launchpad.net/bugs/1626619>
>         <https://launchpad.net/bugs/1626619
>         <https://launchpad.net/bugs/1626619>>
>             https://launchpad.net/bugs/1626630
>         <https://launchpad.net/bugs/1626630>
>         <https://launchpad.net/bugs/1626630
>         <https://launchpad.net/bugs/1626630>>
>             https://launchpad.net/bugs/1626634
>         <https://launchpad.net/bugs/1626634>
>         <https://launchpad.net/bugs/1626634
>         <https://launchpad.net/bugs/1626634>>
>
>                 It would be nice if you could write up something about
>         current
>                 gaps that
>                 would make Heat's life easier, because a large chunk of
>         that initial
>                 email is incorrect and linking to it as a big list of
>         "issues" is
>                 counter-productive.
>
>
>             Yes, agreed. I wish I had a clean thread to link to. It's a huge
>             amount of work to research it all though.
>
>             cheers,
>             Zane.
>
>                 On Fri, May 19, 2017 at 7:36 AM, Zane Bitter
>         <zbitter at redhat.com <mailto:zbitter at redhat.com>
>                 <mailto:zbitter at redhat.com <mailto:zbitter at redhat.com>>
>                 <mailto:zbitter at redhat.com <mailto:zbitter at redhat.com>
>         <mailto:zbitter at redhat.com <mailto:zbitter at redhat.com>>>> wrote:
>
>                     On 18/05/17 20:19, Matt Riedemann wrote:
>
>                         I just wanted to blurt this out since it hit me
>         a few
>                 times at the
>                         summit, and see if I'm misreading the rooms.
>
>                         For the last few years, Nova has pushed back on
>         adding
>                         orchestration to
>                         the compute API, and even define a policy for it
>         since
>                 it comes
>                         up so
>                         much [1]. The stance is that the compute API
>         should expose
>                         capabilities
>                         that a higher-level orchestration service can stitch
>                 together
>                         for a more
>                         fluid end user experience.
>
>
>                     I think this is a wise policy.
>
>                         One simple example that comes up time and again is
>                 allowing a
>                         user to
>                         pass volume type to the compute API when booting
>         from volume
>                         such that
>                         when nova creates the backing volume in Cinder,
>         it passes
>                         through the
>                         volume type. If you need a non-default volume
>         type for
>                 boot from
>                         volume,
>                         the way you do this today is first create the volume
>                 with said
>                         type in
>                         Cinder and then provide that volume to the
>         compute API when
>                         creating the
>                         server. However, people claim that is bad UX or
>         hard for
>                 users to
>                         understand, something like that (at least from a
>         command
>                 line, I
>                         assume
>                         Horizon hides this, and basic users should
>         probably be
>                 using Horizon
>                         anyway right?).
>
>
>                     As always, there's a trade-off between simplicity and
>                 flexibility. I
>                     can certainly understand the logic in wanting to
>         make the simple
>                     stuff simple. But users also need to be able to progress
>                 from simple
>                     stuff to more complex stuff without having to give
>         up and start
>                     over. There's a danger of leading them down the
>         garden path.
>
>                         While talking about claims in the scheduler and
>         a top-level
>                         conductor
>                         for cells v2 deployments, we've talked about the
>         desire
>                 to eliminate
>                         "up-calls" from the compute service to the top-level
>                 controller
>                         services
>                         (nova-api, nova-conductor and nova-scheduler). Build
>                 retries is
>                         one such
>                         up-call. CERN disables build retries, but others
>         rely on
>                 them,
>                         because
>                         of how racy claims in the computes are (that's
>         another
>                 story and why
>                         we're working on fixing it). While talking about
>         this,
>                 we asked,
>                         "why
>                         not just do away with build retries in nova
>         altogether?
>                 If the
>                         scheduler
>                         picks a host and the build fails, it fails, and
>         you have to
>                         retry/rebuild/delete/recreate from a top-level
>         service."
>
>
>                     (FWIW Heat does this for you already.)
>
>                         But during several different Forum sessions,
>         like user API
>                         improvements
>                         [2] but also the cells v2 and claims in the
>         scheduler
>                 sessions,
>                         I was
>                         hearing about how operators only wanted to
>         expose the
>                 base IaaS
>                         services
>                         and APIs and end API users wanted to only use those,
>                 which means any
>                         improvements in those APIs would have to be in
>         the base
>                 APIs (nova,
>                         cinder, etc). To me, that generally means any
>         orchestration
>                         would have
>                         to be baked into the compute API if you're not
>         using Heat or
>                         something
>                         similar.
>
>
>                     The problem is that orchestration done inside APIs
>         is very
>                 easy to
>                     do badly in ways that cause lots of downstream pain for
>                 users and
>                     external orchestrators. For example, Nova already
>         does some
>                     orchestration: it creates a Neutron port for a
>         server if you
>                 don't
>                     specify one. (And then promptly forgets that it has done
>                 so.) There
>                     is literally an entire inner platform, an
>         orchestrator within an
>                     orchestrator, inside Heat to try to manage the
>         fallout from
>                 this.
>                     And the inner platform shares none of the elegance,
>         such as
>                 it is,
>                     of Heat itself, but is rather a collection of
>                 cobbled-together hacks
>                     to deal with the seemingly infinite explosion of
>         edge cases
>                 that we
>                     kept running into over a period of at least 5 releases.
>
>                     The get-me-a-network thing is... better, but there's no
>                 provision
>                     for changes after the server is created, which means
>         we have to
>                     copy-paste the Nova implementation into Heat to deal
>         with
>                 update.[1]
>                     Which sounds like a maintenance nightmare in the making.
>                 That seems
>                     to be a common mistake: to assume that once users create
>                 something
>                     they'll never need to touch it again, except to
>         delete it when
>                     they're done.
>
>                     Don't even get me started on Neutron.[2]
>
>                     Any orchestration that is done behind-the-scenes
>         needs to be
>                 done
>                     superbly well, provide transparency for external
>                 orchestration tools
>                     that need to hook in to the data flow, and should be
>                 developed in
>                     consultation with potential consumers like Shade and
>         Heat.
>
>                         Am I missing the point, or is the pendulum really
>                 swinging away from
>                         PaaS layer services which abstract the dirty
>         details of the
>                         lower-level
>                         IaaS APIs? Or was this always something people
>         wanted
>                 and I've just
>                         never made the connection until now?
>
>
>                     (Aside: can we stop using the term 'PaaS' to refer to
>                 "everything
>                     that Nova doesn't do"? This habit is not helping us to
>                 communicate
>                     clearly.)
>
>                     cheers,
>                     Zane.
>
>                     [1] https://review.openstack.org/#/c/407328/
>         <https://review.openstack.org/#/c/407328/>
>                 <https://review.openstack.org/#/c/407328/
>         <https://review.openstack.org/#/c/407328/>>
>                     <https://review.openstack.org/#/c/407328/
>         <https://review.openstack.org/#/c/407328/>
>                 <https://review.openstack.org/#/c/407328/
>         <https://review.openstack.org/#/c/407328/>>>
>                     [2]
>
>
>         http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
>         <http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>
>
>         <http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
>         <http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>>
>
>
>         <http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
>         <http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>
>
>         <http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html
>         <http://lists.openstack.org/pipermail/openstack-dev/2014-April/032098.html>>>
>
>
>
>
>         __________________________________________________________________________
>                     OpenStack Development Mailing List (not for usage
>         questions)
>                     Unsubscribe:
>
>
>         OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>>
>
>
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>>>
>
>
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>
>
>
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>>
>
>
>
>
>
>         __________________________________________________________________________
>                 OpenStack Development Mailing List (not for usage questions)
>                 Unsubscribe:
>
>         OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>>
>
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>
>
>
>
>
>         __________________________________________________________________________
>             OpenStack Development Mailing List (not for usage questions)
>             Unsubscribe:
>
>         OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>>
>
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>         <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>>
>
>
>
>




More information about the OpenStack-dev mailing list