[openstack][neutron][openvswitch] Openvswitch Packet loss when high throughput (pps)

Satish Patel satish.txt at gmail.com
Fri Sep 8 02:05:54 UTC 2023


Do one thing, use test-pmd base benchmark and see because test-pmd
application is DPDK aware. with test-pmd you will have a 1000% better
performance :)

On Thu, Sep 7, 2023 at 9:59 PM Ha Noi <hanoi952022 at gmail.com> wrote:

> I run the performance test using iperf3. But the performance is not
> increased as theory. I don't know which configuration is not correct.
>
> On Fri, Sep 8, 2023 at 8:57 AM Satish Patel <satish.txt at gmail.com> wrote:
>
>> I would say let's run your same benchmark with OVS-DPDK and tell me if
>> you see better performance. I doubt you will see significant performance
>> boot but lets see. Please prove me wrong :)
>>
>> On Thu, Sep 7, 2023 at 9:45 PM Ha Noi <hanoi952022 at gmail.com> wrote:
>>
>>> Hi Satish,
>>>
>>> Actually, the guess interface is not using tap anymore.
>>>
>>>     <interface type='vhostuser'>
>>>       <mac address='fa:16:3e:76:77:dd'/>
>>>       <source type='unix' path='/var/run/openvswitch/vhu3766ee8a-86'
>>> mode='server'/>
>>>       <target dev='vhu3766ee8a-86'/>
>>>       <model type='virtio'/>
>>>       <alias name='net0'/>
>>>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
>>> function='0x0'/>
>>>     </interface>
>>>
>>> It's totally bypass the kernel stack ?
>>>
>>>
>>>
>>>
>>> On Fri, Sep 8, 2023 at 5:02 AM Satish Patel <satish.txt at gmail.com>
>>> wrote:
>>>
>>>> I did test OVS-DPDK and it helps offload the packet process on compute
>>>> nodes, But what about VMs it will still use a tap interface to attach from
>>>> compute to vm and bottleneck will be in vm. I strongly believe that we have
>>>> to run DPDK based guest to pass through the kernel stack.
>>>>
>>>> I love to hear from other people if I am missing something here.
>>>>
>>>> On Thu, Sep 7, 2023 at 5:27 PM Ha Noi <hanoi952022 at gmail.com> wrote:
>>>>
>>>>> Oh. I heard from someone on the reddit said that Ovs-dpdk is
>>>>> transparent with user?
>>>>>
>>>>> So It’s not correct?
>>>>>
>>>>> On Thu, 7 Sep 2023 at 22:13 Satish Patel <satish.txt at gmail.com> wrote:
>>>>>
>>>>>> Because DPDK required DPDK support inside guest VM. It's not
>>>>>> suitable for general purpose workload. You need your guest VM network to
>>>>>> support DPDK to get 100% throughput.
>>>>>>
>>>>>> On Thu, Sep 7, 2023 at 8:06 AM Ha Noi <hanoi952022 at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Satish,
>>>>>>>
>>>>>>> Why dont you use DPDK?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Thu, 7 Sep 2023 at 19:03 Satish Patel <satish.txt at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I totally agreed with Sean on all his points but trust me, I have
>>>>>>>> tried everything possible to tune OS, Network stack, multi-queue, NUMA, CPU
>>>>>>>> pinning and name it.. but I didn't get any significant improvement. You may
>>>>>>>> gain 2 to 5% gain with all those tweek. I am running the entire workload on
>>>>>>>> sriov and life is happy except no LACP bonding.
>>>>>>>>
>>>>>>>> I am very interesting is this project
>>>>>>>> https://docs.openvswitch.org/en/latest/intro/install/afxdp/
>>>>>>>>
>>>>>>>> On Thu, Sep 7, 2023 at 6:07 AM Ha Noi <hanoi952022 at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Dear Smoney,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Sep 7, 2023 at 12:41 AM <smooney at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, 2023-09-06 at 11:43 -0400, Satish Patel wrote:
>>>>>>>>>> > Damn! We have noticed the same issue around 40k to 55k PPS.
>>>>>>>>>> Trust me
>>>>>>>>>> > nothing is wrong in your config. This is just a limitation of
>>>>>>>>>> the software
>>>>>>>>>> > stack and kernel itself.
>>>>>>>>>> its partly determined by your cpu frequency.
>>>>>>>>>> kernel ovs of yesteryear could handel about 1mpps total on a ~4GHZ
>>>>>>>>>> cpu. with per port troughpuyt being lower dependin on what
>>>>>>>>>> qos/firewall
>>>>>>>>>> rules that were apllied.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My CPU frequency is 3Ghz and using CPU Intel Gold 2nd generation.
>>>>>>>>> I think the problem is tuning in the compute node inside. But I cannot find
>>>>>>>>> any guide or best practices for it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> moving form iptables firewall to ovs firewall can help to some
>>>>>>>>>> degree
>>>>>>>>>> but your partly trading connection setup time for statead state
>>>>>>>>>> troughput
>>>>>>>>>> with the overhead of the connection tracker in ovs.
>>>>>>>>>>
>>>>>>>>>> using stateless security groups can help
>>>>>>>>>>
>>>>>>>>>> we also recently fixed a regression cause by changes in newer
>>>>>>>>>> versions of ovs.
>>>>>>>>>> this was notable in goign form rhel 8 to rhel 9 where litrally it
>>>>>>>>>> reduced
>>>>>>>>>> small packet performce to 1/10th and jumboframes to about 1/2
>>>>>>>>>> on master we have a config option that will set the default qos
>>>>>>>>>> on a port to linux-noop
>>>>>>>>>>
>>>>>>>>>> https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L106-L125
>>>>>>>>>>
>>>>>>>>>> the backports are propsoed upstream
>>>>>>>>>> https://review.opendev.org/q/Id9ef7074634a0f23d67a4401fa8fca363b51bb43
>>>>>>>>>> and we have backported this downstream to adress that performance
>>>>>>>>>> regression.
>>>>>>>>>> the upstram backport is semi stalled just ebcasue we wanted to
>>>>>>>>>> disucss if we shoudl make ti opt in
>>>>>>>>>> by default upstream while backporting but it might be helpful for
>>>>>>>>>> you if this is related to yoru current
>>>>>>>>>> issues.
>>>>>>>>>>
>>>>>>>>>> 40-55 kpps is kind of low for kernel ovs but if you have a low
>>>>>>>>>> clockrate cpu, hybrid_plug + incorrect qos
>>>>>>>>>> then i could see you hitting such a bottelneck.
>>>>>>>>>>
>>>>>>>>>> one workaround by the way without the os-vif workaround
>>>>>>>>>> backported is to set
>>>>>>>>>> /proc/sys/net/core/default_qdisc to not apply any qos or a low
>>>>>>>>>> overhead qos type
>>>>>>>>>> i.e. sudo sysctl -w net.core.default_qdisc=pfifo_fast
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> that may or may not help but i would ensure that your are not
>>>>>>>>>> usign somting like fqdel or cake
>>>>>>>>>> for net.core.default_qdisc and if you are try changing it to
>>>>>>>>>> pfifo_fast and see if that helps.
>>>>>>>>>>
>>>>>>>>>> there isnet much you can do about the cpu clock rate but ^ is
>>>>>>>>>> somethign you can try for free
>>>>>>>>>> note it wont actully take effect on an exsitng vm if you jsut
>>>>>>>>>> change the default but you can use
>>>>>>>>>> tc to also chagne the qdisk for testing. hard rebooting the vm
>>>>>>>>>> shoudl also make the default take effect.
>>>>>>>>>>
>>>>>>>>>> the only other advice i can give assuming kernel ovs is the only
>>>>>>>>>> option you have is
>>>>>>>>>>
>>>>>>>>>> to look at
>>>>>>>>>>
>>>>>>>>>> https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.rx_queue_size
>>>>>>>>>>
>>>>>>>>>> https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.tx_queue_size
>>>>>>>>>> and
>>>>>>>>>>
>>>>>>>>>> https://docs.openstack.org/nova/latest/configuration/extra-specs.html#hw:vif_multiqueue_enabled
>>>>>>>>>>
>>>>>>>>>> if the bottelneck is actully in qemu or the guest kernel rather
>>>>>>>>>> then ovs adjusting the rx/tx queue size and
>>>>>>>>>> using multi queue can help. it will have no effect if ovs is the
>>>>>>>>>> bottel neck.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> I have set this option to 1024, and enable multiqueue as well. But
>>>>>>>>> it did not help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > On Wed, Sep 6, 2023 at 9:21 AM Ha Noi <hanoi952022 at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > > Hi Satish,
>>>>>>>>>> > >
>>>>>>>>>> > > Actually, our customer get this issue when the tx/rx above
>>>>>>>>>> only 40k pps.
>>>>>>>>>> > > So what is the threshold of this throughput for OvS?
>>>>>>>>>> > >
>>>>>>>>>> > >
>>>>>>>>>> > > Thanks and regards
>>>>>>>>>> > >
>>>>>>>>>> > > On Wed, 6 Sep 2023 at 20:19 Satish Patel <
>>>>>>>>>> satish.txt at gmail.com> wrote:
>>>>>>>>>> > >
>>>>>>>>>> > > > Hi,
>>>>>>>>>> > > >
>>>>>>>>>> > > > This is normal because OVS or LinuxBridge wire up VMs using
>>>>>>>>>> TAP interface
>>>>>>>>>> > > > which runs on kernel space and that drives higher interrupt
>>>>>>>>>> and that makes
>>>>>>>>>> > > > the kernel so busy working on handling packets. Standard
>>>>>>>>>> OVS/LinuxBridge
>>>>>>>>>> > > > are not meant for higher PPS.
>>>>>>>>>> > > >
>>>>>>>>>> > > > If you want to handle higher PPS then look for DPDK or
>>>>>>>>>> SRIOV deployment.
>>>>>>>>>> > > > ( We are running everything in SRIOV because of high PPS
>>>>>>>>>> requirement)
>>>>>>>>>> > > >
>>>>>>>>>> > > > On Tue, Sep 5, 2023 at 11:11 AM Ha Noi <
>>>>>>>>>> hanoi952022 at gmail.com> wrote:
>>>>>>>>>> > > >
>>>>>>>>>> > > > > Hi everyone,
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > I'm using Openstack Train and Openvswitch for ML2 driver
>>>>>>>>>> and GRE for
>>>>>>>>>> > > > > tunnel type. I tested our network performance between two
>>>>>>>>>> VMs and suffer
>>>>>>>>>> > > > > packet loss as below.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > VM1: IP: 10.20.1.206
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > VM2: IP: 10.20.1.154 <https://10.20.1.154/24>
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > VM3: IP: 10.20.1.72
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Using iperf3 to testing performance between VM1 and VM2.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Run iperf3 client and server on both VMs.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > On VM2: iperf3 -t 10000 -b 130M -l 442 -P 6 -u -c
>>>>>>>>>> 10.20.1.206
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > On VM1: iperf3 -t 10000 -b 130M -l 442 -P 6 -u -c
>>>>>>>>>> 10.20.1.154
>>>>>>>>>> > > > > <https://10.20.1.154/24>
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Using VM3 ping into VM1, then the packet is lost and the
>>>>>>>>>> latency is
>>>>>>>>>> > > > > quite high.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > ping -i 0.1 10.20.1.206
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > PING 10.20.1.206 (10.20.1.206) 56(84) bytes of data.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=1 ttl=64 time=7.70 ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=2 ttl=64 time=6.90 ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=3 ttl=64 time=7.71 ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=4 ttl=64 time=7.98 ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=6 ttl=64 time=8.58 ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=7 ttl=64 time=8.34 ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=8 ttl=64 time=8.09 ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=10 ttl=64 time=4.57
>>>>>>>>>> ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=11 ttl=64 time=8.74
>>>>>>>>>> ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=12 ttl=64 time=9.37
>>>>>>>>>> ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=14 ttl=64 time=9.59
>>>>>>>>>> ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=15 ttl=64 time=7.97
>>>>>>>>>> ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=16 ttl=64 time=8.72
>>>>>>>>>> ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 64 bytes from 10.20.1.206: icmp_seq=17 ttl=64 time=9.23
>>>>>>>>>> ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > ^C
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > --- 10.20.1.206 ping statistics ---
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > 34 packets transmitted, 28 received, 17.6471% packet
>>>>>>>>>> loss, time 3328ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > rtt min/avg/max/mdev = 1.396/6.266/9.590/2.805 ms
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Does any one get this issue ?
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Please help me. Thanks
>>>>>>>>>> > > > >
>>>>>>>>>> > > >
>>>>>>>>>>
>>>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230907/beda0d62/attachment-0001.htm>


More information about the openstack-discuss mailing list