[openstack][neutron][openvswitch] Openvswitch Packet loss when high throughput (pps)

smooney at redhat.com smooney at redhat.com
Fri Sep 8 14:20:01 UTC 2023


On Thu, 2023-09-07 at 22:05 -0400, Satish Patel wrote:
> Do one thing, use test-pmd base benchmark and see because test-pmd
> application is DPDK aware. with test-pmd you will have a 1000% better
> performance :)

actully test-pmd is not DPDK aware
its a dpdk applciation so it is faster because it remove the overhead of kernel networking in the guest
not because it has any dpdk awareness. testpmd cannot tell that ovs-dpdk is in use
form a gust perspecive you cannot tell if your using ovs-dpdk or kernel ovs as there is no viable diffence
in the virtio-net-pci device which  is presented to the guest kernel by qemu.

iper3 with a single core cant actully saturate a virtio-net-interface when its backed
by vhost-user/dpdk or something like a macvtap sriov port.
you can reach line rate with larger packet sizes or multipel cores but
if you wanted too test small packet io then testpmd, dpdk packetgen  or tgen
are better tools in that regard. they can eaiclly saturate a link into the 10s of gibitits per second using
64byte packets.

> 
> On Thu, Sep 7, 2023 at 9:59 PM Ha Noi <hanoi952022 at gmail.com> wrote:
> 
> > I run the performance test using iperf3. But the performance is not
> > increased as theory. I don't know which configuration is not correct.
> > 
> > On Fri, Sep 8, 2023 at 8:57 AM Satish Patel <satish.txt at gmail.com> wrote:
> > 
> > > I would say let's run your same benchmark with OVS-DPDK and tell me if
> > > you see better performance. I doubt you will see significant performance
> > > boot but lets see. Please prove me wrong :)
> > > 
> > > On Thu, Sep 7, 2023 at 9:45 PM Ha Noi <hanoi952022 at gmail.com> wrote:
> > > 
> > > > Hi Satish,
> > > > 
> > > > Actually, the guess interface is not using tap anymore.
> > > > 
> > > >     <interface type='vhostuser'>
> > > >       <mac address='fa:16:3e:76:77:dd'/>
> > > >       <source type='unix' path='/var/run/openvswitch/vhu3766ee8a-86'
> > > > mode='server'/>
> > > >       <target dev='vhu3766ee8a-86'/>
> > > >       <model type='virtio'/>
> > > >       <alias name='net0'/>
> > > >       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> > > > function='0x0'/>
> > > >     </interface>
> > > > 
> > > > It's totally bypass the kernel stack ?
yep dpdk is userspace networkign and it gets its performace boost form that
so the data is "trasnproted" by doding a direct mmap of the virtio ring buffers
between the DPDK pool mode driver and the qemu process.

> > > > 
> > > > 
> > > > 
> > > > 
> > > > On Fri, Sep 8, 2023 at 5:02 AM Satish Patel <satish.txt at gmail.com>
> > > > wrote:
> > > > 
> > > > > I did test OVS-DPDK and it helps offload the packet process on compute
> > > > > nodes, But what about VMs it will still use a tap interface to attach from
> > > > > compute to vm and bottleneck will be in vm. I strongly believe that we have
> > > > > to run DPDK based guest to pass through the kernel stack.
> > > > > 
> > > > > I love to hear from other people if I am missing something here.
> > > > > 
> > > > > On Thu, Sep 7, 2023 at 5:27 PM Ha Noi <hanoi952022 at gmail.com> wrote:
> > > > > 
> > > > > > Oh. I heard from someone on the reddit said that Ovs-dpdk is
> > > > > > transparent with user?
> > > > > > 
> > > > > > So It’s not correct?
> > > > > > 
> > > > > > On Thu, 7 Sep 2023 at 22:13 Satish Patel <satish.txt at gmail.com> wrote:
> > > > > > 
> > > > > > > Because DPDK required DPDK support inside guest VM. It's not
> > > > > > > suitable for general purpose workload. You need your guest VM network to
> > > > > > > support DPDK to get 100% throughput.
> > > > > > > 
> > > > > > > On Thu, Sep 7, 2023 at 8:06 AM Ha Noi <hanoi952022 at gmail.com> wrote:
> > > > > > > 
> > > > > > > > Hi Satish,
> > > > > > > > 
> > > > > > > > Why dont you use DPDK?
> > > > > > > > 
> > > > > > > > Thanks
> > > > > > > > 
> > > > > > > > On Thu, 7 Sep 2023 at 19:03 Satish Patel <satish.txt at gmail.com>
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > > I totally agreed with Sean on all his points but trust me, I have
> > > > > > > > > tried everything possible to tune OS, Network stack, multi-queue, NUMA, CPU
> > > > > > > > > pinning and name it.. but I didn't get any significant improvement. You may
> > > > > > > > > gain 2 to 5% gain with all those tweek. I am running the entire workload on
> > > > > > > > > sriov and life is happy except no LACP bonding.
> > > > > > > > > 
> > > > > > > > > I am very interesting is this project
> > > > > > > > > https://docs.openvswitch.org/en/latest/intro/install/afxdp/
> > > > > > > > > 
> > > > > > > > > On Thu, Sep 7, 2023 at 6:07 AM Ha Noi <hanoi952022 at gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > Dear Smoney,
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On Thu, Sep 7, 2023 at 12:41 AM <smooney at redhat.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > > On Wed, 2023-09-06 at 11:43 -0400, Satish Patel wrote:
> > > > > > > > > > > > Damn! We have noticed the same issue around 40k to 55k PPS.
> > > > > > > > > > > Trust me
> > > > > > > > > > > > nothing is wrong in your config. This is just a limitation of
> > > > > > > > > > > the software
> > > > > > > > > > > > stack and kernel itself.
> > > > > > > > > > > its partly determined by your cpu frequency.
> > > > > > > > > > > kernel ovs of yesteryear could handel about 1mpps total on a ~4GHZ
> > > > > > > > > > > cpu. with per port troughpuyt being lower dependin on what
> > > > > > > > > > > qos/firewall
> > > > > > > > > > > rules that were apllied.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > My CPU frequency is 3Ghz and using CPU Intel Gold 2nd generation.
> > > > > > > > > > I think the problem is tuning in the compute node inside. But I cannot find
> > > > > > > > > > any guide or best practices for it.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > moving form iptables firewall to ovs firewall can help to some
> > > > > > > > > > > degree
> > > > > > > > > > > but your partly trading connection setup time for statead state
> > > > > > > > > > > troughput
> > > > > > > > > > > with the overhead of the connection tracker in ovs.
> > > > > > > > > > > 
> > > > > > > > > > > using stateless security groups can help
> > > > > > > > > > > 
> > > > > > > > > > > we also recently fixed a regression cause by changes in newer
> > > > > > > > > > > versions of ovs.
> > > > > > > > > > > this was notable in goign form rhel 8 to rhel 9 where litrally it
> > > > > > > > > > > reduced
> > > > > > > > > > > small packet performce to 1/10th and jumboframes to about 1/2
> > > > > > > > > > > on master we have a config option that will set the default qos
> > > > > > > > > > > on a port to linux-noop
> > > > > > > > > > > 
> > > > > > > > > > > https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L106-L125
> > > > > > > > > > > 
> > > > > > > > > > > the backports are propsoed upstream
> > > > > > > > > > > https://review.opendev.org/q/Id9ef7074634a0f23d67a4401fa8fca363b51bb43
> > > > > > > > > > > and we have backported this downstream to adress that performance
> > > > > > > > > > > regression.
> > > > > > > > > > > the upstram backport is semi stalled just ebcasue we wanted to
> > > > > > > > > > > disucss if we shoudl make ti opt in
> > > > > > > > > > > by default upstream while backporting but it might be helpful for
> > > > > > > > > > > you if this is related to yoru current
> > > > > > > > > > > issues.
> > > > > > > > > > > 
> > > > > > > > > > > 40-55 kpps is kind of low for kernel ovs but if you have a low
> > > > > > > > > > > clockrate cpu, hybrid_plug + incorrect qos
> > > > > > > > > > > then i could see you hitting such a bottelneck.
> > > > > > > > > > > 
> > > > > > > > > > > one workaround by the way without the os-vif workaround
> > > > > > > > > > > backported is to set
> > > > > > > > > > > /proc/sys/net/core/default_qdisc to not apply any qos or a low
> > > > > > > > > > > overhead qos type
> > > > > > > > > > > i.e. sudo sysctl -w net.core.default_qdisc=pfifo_fast
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > that may or may not help but i would ensure that your are not
> > > > > > > > > > > usign somting like fqdel or cake
> > > > > > > > > > > for net.core.default_qdisc and if you are try changing it to
> > > > > > > > > > > pfifo_fast and see if that helps.
> > > > > > > > > > > 
> > > > > > > > > > > there isnet much you can do about the cpu clock rate but ^ is
> > > > > > > > > > > somethign you can try for free
> > > > > > > > > > > note it wont actully take effect on an exsitng vm if you jsut
> > > > > > > > > > > change the default but you can use
> > > > > > > > > > > tc to also chagne the qdisk for testing. hard rebooting the vm
> > > > > > > > > > > shoudl also make the default take effect.
> > > > > > > > > > > 
> > > > > > > > > > > the only other advice i can give assuming kernel ovs is the only
> > > > > > > > > > > option you have is
> > > > > > > > > > > 
> > > > > > > > > > > to look at
> > > > > > > > > > > 
> > > > > > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.rx_queue_size
> > > > > > > > > > > 
> > > > > > > > > > > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.tx_queue_size
> > > > > > > > > > > and
> > > > > > > > > > > 
> > > > > > > > > > > https://docs.openstack.org/nova/latest/configuration/extra-specs.html#hw:vif_multiqueue_enabled
> > > > > > > > > > > 
> > > > > > > > > > > if the bottelneck is actully in qemu or the guest kernel rather
> > > > > > > > > > > then ovs adjusting the rx/tx queue size and
> > > > > > > > > > > using multi queue can help. it will have no effect if ovs is the
> > > > > > > > > > > bottel neck.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > I have set this option to 1024, and enable multiqueue as well. But
> > > > > > > > > > it did not help.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > On Wed, Sep 6, 2023 at 9:21 AM Ha Noi <hanoi952022 at gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > Hi Satish,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Actually, our customer get this issue when the tx/rx above
> > > > > > > > > > > only 40k pps.
> > > > > > > > > > > > > So what is the threshold of this throughput for OvS?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Thanks and regards
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Wed, 6 Sep 2023 at 20:19 Satish Patel <
> > > > > > > > > > > satish.txt at gmail.com> wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > This is normal because OVS or LinuxBridge wire up VMs using
> > > > > > > > > > > TAP interface
> > > > > > > > > > > > > > which runs on kernel space and that drives higher interrupt
> > > > > > > > > > > and that makes
> > > > > > > > > > > > > > the kernel so busy working on handling packets. Standard
> > > > > > > > > > > OVS/LinuxBridge
> > > > > > > > > > > > > > are not meant for higher PPS.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > If you want to handle higher PPS then look for DPDK or
> > > > > > > > > > > SRIOV deployment.
> > > > > > > > > > > > > > ( We are running everything in SRIOV because of high PPS
> > > > > > > > > > > requirement)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Tue, Sep 5, 2023 at 11:11 AM Ha Noi <
> > > > > > > > > > > hanoi952022 at gmail.com> wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I'm using Openstack Train and Openvswitch for ML2 driver
> > > > > > > > > > > and GRE for
> > > > > > > > > > > > > > > tunnel type. I tested our network performance between two
> > > > > > > > > > > VMs and suffer
> > > > > > > > > > > > > > > packet loss as below.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > VM1: IP: 10.20.1.206
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > VM2: IP: 10.20.1.154 <https://10.20.1.154/24>
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > VM3: IP: 10.20.1.72
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Using iperf3 to testing performance between VM1 and VM2.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Run iperf3 client and server on both VMs.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On VM2: iperf3 -t 10000 -b 130M -l 442 -P 6 -u -c
> > > > > > > > > > > 10.20.1.206
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On VM1: iperf3 -t 10000 -b 130M -l 442 -P 6 -u -c
> > > > > > > > > > > 10.20.1.154
> > > > > > > > > > > > > > > <https://10.20.1.154/24>
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Using VM3 ping into VM1, then the packet is lost and the
> > > > > > > > > > > latency is
> > > > > > > > > > > > > > > quite high.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > ping -i 0.1 10.20.1.206
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > PING 10.20.1.206 (10.20.1.206) 56(84) bytes of data.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=1 ttl=64 time=7.70 ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=2 ttl=64 time=6.90 ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=3 ttl=64 time=7.71 ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=4 ttl=64 time=7.98 ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=6 ttl=64 time=8.58 ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=7 ttl=64 time=8.34 ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=8 ttl=64 time=8.09 ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=10 ttl=64 time=4.57
> > > > > > > > > > > ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=11 ttl=64 time=8.74
> > > > > > > > > > > ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=12 ttl=64 time=9.37
> > > > > > > > > > > ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=14 ttl=64 time=9.59
> > > > > > > > > > > ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=15 ttl=64 time=7.97
> > > > > > > > > > > ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=16 ttl=64 time=8.72
> > > > > > > > > > > ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 64 bytes from 10.20.1.206: icmp_seq=17 ttl=64 time=9.23
> > > > > > > > > > > ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > ^C
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > --- 10.20.1.206 ping statistics ---
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 34 packets transmitted, 28 received, 17.6471% packet
> > > > > > > > > > > loss, time 3328ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > rtt min/avg/max/mdev = 1.396/6.266/9.590/2.805 ms
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Does any one get this issue ?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Please help me. Thanks
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 




More information about the openstack-discuss mailing list