[neutron]How to ease congestion at neutron server nodes?
Hi Stackers, What solution(s) other than DVR could I use to avoid north-south traffic congestion at the neutron server nodes? Basically, I wish to let VMs with floating IPs to route directly from their respective hypervisor hosts to the Internet. Thank you very much. Regards, Cody
Cody writes:
What solution(s) other than DVR could I use to avoid north-south traffic congestion at the neutron server nodes? Basically, I wish to let VMs with floating IPs to route directly from their respective hypervisor hosts to the Internet.
Isn't that the DEFINITION of what DVR does? :-) (Not using DVR myself, so I may be wrong.) We push everything through those central nodes - we call them "network nodes". We try to alleviate the congestion by distributing routers across multiple nodes, and within each node we make sure that the forwarding plane (Open vSwitch in our case) is capable of using the "multi-queue" feature of the underlying network cards, so that packet forwarding is distributed across the multiple cores of those servers. That helped us a lot at the time. -- Simon.
On Fri, 2019-01-18 at 15:42 +0100, Simon Leinen wrote:
Cody writes:
What solution(s) other than DVR could I use to avoid north-south traffic congestion at the neutron server nodes?
well the noth south traffic is processed by the nodes running the l3 agent so if you run the neutron server/api on the controller nodes and have dedicated networking ndoes for l3 and dhcp agents then that would achive what you desire without dvr in terms of not overloading the node that is running the neutron server.
Basically, I wish to let VMs with floating IPs to route directly from their respective hypervisor hosts to the Internet. it would not however achive ^
Isn't that the DEFINITION of what DVR does? :-) yes that is the usecase that dvr with centralised snat tried to solve.
dvr with distrbuted snat would obviosly loadblance the snat traffic away from the network nodes but im not sure we ever got that to work.
(Not using DVR myself, so I may be wrong.)
We push everything through those central nodes - we call them "network nodes". We try to alleviate the congestion by distributing routers across multiple nodes, and within each node we make sure that the forwarding plane (Open vSwitch in our case) is capable of using the "multi-queue" feature of the underlying network cards, so that packet forwarding is distributed across the multiple cores of those servers. That helped us a lot at the time.
Hi Simon and Sean, What amount of throughput are you able to get through your network nodes? At what point do you bottleneck north-south traffic and where is the bottleneck? Can you elaborate more on the multi-queue configuration? We have 3 controllers where the neutron server/api/L3 agents run and the active L3 agent for a particular neutron router's will drive the controllers CPU interrupts through the roof (400k) at which point it begins to cause instability (failovers) amongst all L3 agents on that controller node well before we come close to saturating the 40Gbps (4x10Gbps lacp bond) of available bandwidth to it.
On Fri, 2019-01-18 at 13:16 -0500, shubjero wrote:
Hi Simon and Sean,
What amount of throughput are you able to get through your network nodes?
that will very wildly dependign on your nics and the networking solution you deployed. ovs? with or without dpdk? with or without an sdn contoller? with or without hardware offload? vpp? calico? linux bridge? maybe some operators can share there experience.
At what point do you bottleneck north-south traffic and where is the bottleneck? ill assume you are using kernel ovs. with kenerl ovs you bottelneck will likely be ovs and possible the kernel routing stack. kernel ovs can only handel about 1.8mpps in L2 phy to phy switching maybe a bit more depending on you cpu frequency and kernel version. i have not been following this metric for ovs that closely but its somewhere in that neighbourhood.
that is enough to switch about 1.2Gbps of 64k packets but can saturage a 10G link at ~ 512b packets. unless you are using juboframes or some nic offloads you cannot to my knowlage satuage a 40G link with kernel ovs. not when i say nic offload i am not refering to hardware offload ovs i am refering to GSO LRO and other offloads you enable via ethtool. the kernel ovs can switch in excess of 10Gbps of through put quite easy with standard mtu packets. 1500 MTU packet rate calculated as: (10*10^9) bits/sec / (1538 bytes * 8) = 812,744 pps for 40Gbps it would go up to 2.6mpps which is more then kernel ovs can forward on a standard server cpu frequecies int 2.4-3.6GHz range. if you are using 9K jumboframes that packet forwardign rate drops too 442,576 pps which again is well within kernel ovs ability to forward. the packet classifcation and header extration is much more costly then copying the packet payload so pps is more important then the size of the packet but they are related. so depending on your trafic profile kernel ovs may or may not be a bottleneck. the next bottelneck is the kernel routing speed. the linux kernel i pretty good at routing but in the neutron case it not only has to do routing but also nat. the ip table snat and dnat actions are likely to be the next bottelneck after ovs.
Can you elaborate more on the multi-queue configuration? i belive simon was refering to enableing multiple rx/tx ques on the nic attached to ovs such that the nics recive side scaling feature be used to hash packet into a set of hardware recive queue which can then be processed by ovs using multiple kernel threads across several cpus.
We have 3 controllers where the neutron server/api/L3 agents run and the active L3 agent for a particular neutron router's will drive the controllers CPU interrupts through the roof (400k) at which point it begins to cause one way to scale interrupt handling is irqblance assuming your nic supprot interupt steering but most if not all 10G nics do. instability (failovers) amongst all L3 agents on that controller node well before we come close to saturating the 40Gbps (4x10Gbps lacp bond) of available bandwidth to it.
On Fri, Jan 18, 2019 at 1:21 PM shubjero <shubjero@gmail.com> wrote:
Hi Simon and Sean,
What amount of throughput are you able to get through your network nodes? At what point do you bottleneck north-south traffic and where is the bottleneck? Can you elaborate more on the multi-queue configuration?
We have 3 controllers where the neutron server/api/L3 agents run and the active L3 agent for a particular neutron router's will drive the controllers CPU interrupts through the roof (400k) at which point it begins to cause instability (failovers) amongst all L3 agents on that controller node well before we come close to saturating the 40Gbps (4x10Gbps lacp bond) of available bandwidth to it.
I'd look in to either: 1) Splitting out the L3/DHCP/metadata/OVS agents from the controller nodes to dedicated network nodes 2) Enabling DVR 3) Moving over to OVN with DVR
participants (5)
-
Assaf Muller
-
Cody
-
Sean Mooney
-
shubjero
-
Simon Leinen