Not getting full bandwidth VXLAN + DVR
Hello, In our OpenStack environment(Newton) we are using 10G network in all our nodes. We are using OVS bridging with VXLAN tunneling and DVR. We also enabled Jumbo frames in NIC and also in physical switches. We also enabled VXLAN offloading in our NIC. irqbalance is running which suppose to distribute the network irqs to all cores of the CPU. But unfortunately we are only getting below 1G bandwidth when communicate with our VM's with floating IP's from compute hosts. We tested it using iperf and results are like Host to VM using floating IP - less than 1Gbits/sec VM to VM using internal IP - ~2.5Gbits/sec Any idea or solutions to solve this issue is much appreciated. -- With kind regards, Yedhu Sastri
On Fri, 2019-01-25 at 10:14 +0100, Yedhu Sastri wrote:
Hello,
In our OpenStack environment(Newton) we are using 10G network in all our nodes. We are using OVS bridging with VXLAN tunneling and DVR. We also enabled Jumbo frames in NIC and also in physical switches. We also enabled VXLAN offloading in our NIC. irqbalance is running which suppose to distribute the network irqs to all cores of the CPU. But unfortunately we are only getting below 1G bandwidth when communicate with our VM's with floating IP's from compute hosts. We tested it using iperf and results are like
Host to VM using floating IP - less than 1Gbits/sec sorry for the complexity of the diagram but with dvr your networking will look something like this https://docs.openstack.org/ocata/networking-guide/_images/deploy-ovs-ha-dvr-...
the diagram is actully incorrect in that there should not be a line between port interface 3 and interface 3 as that imply you add a phyical nic to the br-tun which is not correct. when iperf connects via the internet the ingress flow is described here https://docs.openstack.org/ocata/networking-guide/deploy-ovs-ha-dvr.html but i will summarise below. looking at the simplifed diagram https://docs.openstack.org/ocata/networking-guide/_images/deploy-ovs-ha-dvr-... 1 packet arrives in datacenter wan uplink and is switch to your wan router. the wan router haveing connectivity to the subnet of your floating ip generate an arp to discover the mac of the floatin ip. the arp request ingress the compute node on interface 2 enters the br-provider bridge (usually called br-ex) crosses the patch port to the br-int where it exits via fg port labled 6 and enters the fip namespace via the tap device created by ovs labled 7 in the diagram. this tap has the floating ip assigned so it responds to the arp which when recived by your wan router trigers it to lean the dset mac and route the tcp stream from your iperf client to that mac. the iperf traffic takes teh same path to fip namespace where it is intercepted by an iptables DNAT rule which updates the destination ip to the private ip and it is sent to the dvr namespace by a veth pair labled 8 and 9 in the diagram. once the packet is recived in the dvr namespace it is routed to ovs via interface 10 after doing a similar arp request to learn the dest mac for the private ip. the packet ender the br-int and if you are using the iptable firewall dirver exit via another veth pair and enter linux bridge which the vms tab device is connected to and finally gets to the vm. not if you are using the conntrack or noop firewall driver the vm tap is added to the br-int directly so the qbr linux bridge and the veth pair shown as 12 and 13 will not exist. finally the vm recived the iperf connect and the reponce packets are sent backward through the same path. so i went thorugh that flow for 2 reasons. first in the north south path the network encapsulation used for the teant network e.g. vxlan is irellevent as the packet is never vxlan encapsulated. second there are several places where there could be botelnecks. first are you using 10G nics for the br-ex/br-provider bridge? second is the local tunnel endpoint ip assinged to this bridge.? the answer should be yes to both and i will procedd as if the answer is yes. if you are not using a 10G nic for the br-ex then that is why you are seeing sub 1G speeds next you mention you are using jumbo frames. assuming you are using 9000 byte mtu then i would expect the mtu of the neutorn vxlan network to be 8950. in this case looking at https://docs.openstack.org/ocata/networking-guide/_images/deploy-ovs-ha-dvr-... again you should check the mtus are set correctly at the following locations. interfce 2 should be set to your phyical network mtu which im assuming is 9000 in this example interfces 15(vm tap), 14(qbr bridge) 13,(qvb veth interface) and 10(qr port in dvr namespace) should all have there mtu set to 8950. interface 9(rfp), 8(fpr) and 7(fg) should be set to 9000. when you do your testing with iperf you should be setting you mtu or packet size to 8950. if you use 9000 it will force the tcp packets to be segmented when it is routed from the rfp interface tothe qr interface in the dvr namespace which will requrie the vm to reassmeble it later. this is the first bottelneck you will need to ensure is not present. when you are doing the vm to vm testing it will use an mtu of 8950 as that is the mtu of the neutron network and is included in the dhcp reply. if you have validated that the mtus are set correctly the next stpe is to determin if packet are bing droped to do this you need to check interface 16(the vm interface in the vm) 15 (the vm tap on the host) 13/12 the veth between ovs and the linux bridge 10(the dvr interface) 9/8 (the veth between the fip and dvr namespace) 7( the floatin ip gateway port on ovs) and finally 2 the uplink to the physical network. if you see packet loss on vm on either port 16 or 15 you can try to enable multi queue for the virtio interface you do that by setting hw_vif_multiqueue_enabled=true in the image metadata and then enabling multiqueu in the guest with ethtool -L <NIC> combined #num_of_queues. if the packet loss is observed on teh veth between the linux bridge and ovs (13/12) then you could change form the ip tables firewall to conntrack or noop firewall driver. if the bottleneck is in the dvr router namespace between port 10 and 9 and its not cause by ip fragmentaiton then you are hitting a kernel limitaion and you will need to tune the kernel to improve routing performance. if the packet loss is betwen 8 and 7 you are hitting a linux kernel dnat bottleneck. again some kernel option may be able to optimise this but there is not much you can do. if the pakcet loss is in RX on interfce 2 you need to ensure the Rescive side scalingin is enable and the nic is configured to use muliptle quese ethtool -L <NIC> combined #num_of_queues. you should also ensure that offload such a LRO are enabled if availabel on your nic. if none of the above help then your only recorse is to evaluate other neturon netowkring solution such as OVN which will implement dvr/fip/nat using openflow rules in ovs. i hope this helps. regards sean
VM to VM using internal IP - ~2.5Gbits/sec
Any idea or solutions to solve this issue is much appreciated.
participants (2)
-
Sean Mooney
-
Yedhu Sastri