[openstack-dev] [tripleo] [neutron] Current containerized neutron agents introduce a significant regression in the dataplane

Bogdan Dobrelya bdobreli at redhat.com
Wed Feb 14 13:01:12 UTC 2018


On 2/14/18 11:58 AM, Daniel Alvarez Sanchez wrote:
> 
> 
> On Wed, Feb 14, 2018 at 5:40 AM, Brian Haley <haleyb.dev at gmail.com 
> <mailto:haleyb.dev at gmail.com>> wrote:
> 
>     On 02/13/2018 05:08 PM, Armando M. wrote:
> 
> 
> 
>         On 13 February 2018 at 14:02, Brent Eagles <beagles at redhat.com
>         <mailto:beagles at redhat.com> <mailto:beagles at redhat.com
>         <mailto:beagles at redhat.com>>> wrote:
> 
>              Hi,
> 
>              The neutron agents are implemented in such a way that key
>              functionality is implemented in terms of haproxy, dnsmasq,
>              keepalived and radvd configuration. The agents manage
>         instances of
>              these services but, by design, the parent is the top-most
>         (pid 1).
> 
>              On baremetal this has the advantage that, while control plane
>              changes cannot be made while the agents are not available, the
>              configuration at the time the agents were stopped will work
>         (for
>              example, VMs that are restarted can request their IPs, etc). In
>              short, the dataplane is not affected by shutting down the
>         agents.
> 
>              In the TripleO containerized version of these agents, the
>         supporting
>              processes (haproxy, dnsmasq, etc.) are run within the agent's
>              container so when the container is stopped, the supporting
>         processes
>              are also stopped. That is, the behavior with the current
>         containers
>              is significantly different than on baremetal and
>         stopping/restarting
>              containers effectively breaks the dataplane. At the moment
>         this is
>              being considered a blocker and unless we can find a
>         resolution, we
>              may need to recommend running the L3, DHCP and metadata
>         agents on
>              baremetal.
> 
> 
>     I didn't think the neutron metadata agent was affected but just the
>     ovn-metadata agent?  Or is there a problem with the UNIX domain
>     sockets the haproxy instances use to connect to it when the
>     container is restarted?
> 
> 
> That's right. In ovn-metadata-agent we spawn haproxy inside the 
> q-ovnmeta namespace
> and this is where we'll find a problem if the process goes away. As you 
> said, neutron
> metadata agent is basically receiving the proxied requests from 
> haproxies residing
> in either q-router or q-dhcp namespaces on its UNIX socket and sending 
> them to Nova.
> 
> 
> 
>         There's quite a bit to unpack here: are you suggesting that
>         running these services in HA configuration doesn't help either
>         with the data plane being gone after a stop/restart? Ultimately
>         this boils down to where the state is persisted, and while
>         certain agents rely on namespaces and processes whose ephemeral
>         nature is hard to persist, enough could be done to allow for a
>         non-disruptive bumping of the afore mentioned services.
> 
> 
>     Armando - https://review.openstack.org/#/c/542858/
>     <https://review.openstack.org/#/c/542858/> (if accepted) should help
>     with dataplane downtime, as sharing the namespaces lets them
>     persist, which eases what the agent has to configure on the restart
>     of a container (think of what the l3-agent needs to create for 1000
>     routers).
> 
>     But it doesn't address dnsmasq being unavailable when the dhcp-agent
>     container is restarted like it is today.  Maybe one way around that
>     is to run 2+ agents per network, but that still leaves a regression
>     from how it works today.  Even with l3-ha I'm not sure things are
>     perfect, might wind-up with two masters sometimes.
> 
>     I've seen one suggestion of putting all these processes in their own
>     container instead of the agent container so they continue to run, it
>     just might be invasive to the neutron code.  Maybe there is another
>     option?
> 
> 
> I had some idea based on that one to reduce the impact on neutron code 
> and its dependency on
> containers. Basically, we would be running dnsmasq, haproxy, keepalived, 
> radvd, etc
> in separate containers (it makes sense as they have independent 
> lifecycles) and we would drive

+1 for that separation

> those through the docker socket from neutron agents. In order to reduce 
> this dependency, I
> thought of having some sort of 'rootwrap-daemon-docker' which takes the 

Let's please avoid using 'docker' in names, could it be rootwrap-cri or 
rootwrap-engine-moby or something?

> commands and
> checks if it has to spawn the process in a separate container (for 
> example, iptables wouldn't
> be the case) and if so, it'll use the docker socket to do it.
> We'll also have to monitor the PID files on those containers to respawn 
> them in case they
> die.
> 
> IMHO, this is far from the containers philosophy since we're using host 
> networking,
> privileged access, sharing namespaces, relying on 'sidecar' 
> containers... but I can't think of
> a better way to do it.

This still looks fitting well into the k8s pods concept [0], with 
healthchecks and shared namespaces and logical coupling of sidecars, 
which is the agents and helping daemons running in namespaces. I hope it 
does.

[0] https://kubernetes.io/docs/concepts/workloads/pods/pod/


> 
> 
> 
>     -Brian
> 
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando



More information about the OpenStack-dev mailing list