I would say let's run your same benchmark with OVS-DPDK and tell me if you see better performance. I doubt you will see significant performance boot but lets see. Please prove me wrong :) 

On Thu, Sep 7, 2023 at 9:45 PM Ha Noi <hanoi952022@gmail.com> wrote:
Hi Satish,

Actually, the guess interface is not using tap anymore.

    <interface type='vhostuser'>
      <mac address='fa:16:3e:76:77:dd'/>
      <source type='unix' path='/var/run/openvswitch/vhu3766ee8a-86' mode='server'/>
      <target dev='vhu3766ee8a-86'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

It's totally bypass the kernel stack ?




On Fri, Sep 8, 2023 at 5:02 AM Satish Patel <satish.txt@gmail.com> wrote:
I did test OVS-DPDK and it helps offload the packet process on compute nodes, But what about VMs it will still use a tap interface to attach from compute to vm and bottleneck will be in vm. I strongly believe that we have to run DPDK based guest to pass through the kernel stack.  

I love to hear from other people if I am missing something here. 

On Thu, Sep 7, 2023 at 5:27 PM Ha Noi <hanoi952022@gmail.com> wrote:
Oh. I heard from someone on the reddit said that Ovs-dpdk is transparent with user?

So It’s not correct?

On Thu, 7 Sep 2023 at 22:13 Satish Patel <satish.txt@gmail.com> wrote:
Because DPDK required DPDK support inside guest VM. It's not suitable for general purpose workload. You need your guest VM network to support DPDK to get 100% throughput. 

On Thu, Sep 7, 2023 at 8:06 AM Ha Noi <hanoi952022@gmail.com> wrote:
Hi Satish,

Why dont you use DPDK?

Thanks 

On Thu, 7 Sep 2023 at 19:03 Satish Patel <satish.txt@gmail.com> wrote:
I totally agreed with Sean on all his points but trust me, I have tried everything possible to tune OS, Network stack, multi-queue, NUMA, CPU pinning and name it.. but I didn't get any significant improvement. You may gain 2 to 5% gain with all those tweek. I am running the entire workload on sriov and life is happy except no LACP bonding. 

I am very interesting is this project https://docs.openvswitch.org/en/latest/intro/install/afxdp/ 

On Thu, Sep 7, 2023 at 6:07 AM Ha Noi <hanoi952022@gmail.com> wrote:
Dear Smoney, 



On Thu, Sep 7, 2023 at 12:41 AM <smooney@redhat.com> wrote:
On Wed, 2023-09-06 at 11:43 -0400, Satish Patel wrote:
> Damn! We have noticed the same issue around 40k to 55k PPS. Trust me
> nothing is wrong in your config. This is just a limitation of the software
> stack and kernel itself.
its partly determined by your cpu frequency.
kernel ovs of yesteryear could handel about 1mpps total on a ~4GHZ
cpu. with per port troughpuyt being lower dependin on what qos/firewall
rules that were apllied.



My CPU frequency is 3Ghz and using CPU Intel Gold 2nd generation. I think the problem is tuning in the compute node inside. But I cannot find any guide or best practices for it.

 
moving form iptables firewall to ovs firewall can help to some degree
but your partly trading connection setup time for statead state troughput
with the overhead of the connection tracker in ovs.

using stateless security groups can help

we also recently fixed a regression cause by changes in newer versions of ovs.
this was notable in goign form rhel 8 to rhel 9 where litrally it reduced
small packet performce to 1/10th and jumboframes to about 1/2
on master we have a config option that will set the default qos on a port to linux-noop
https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L106-L125

the backports are propsoed upstream https://review.opendev.org/q/Id9ef7074634a0f23d67a4401fa8fca363b51bb43
and we have backported this downstream to adress that performance regression.
the upstram backport is semi stalled just ebcasue we wanted to disucss if we shoudl make ti opt in
by default upstream while backporting but it might be helpful for you if this is related to yoru current
issues.

40-55 kpps is kind of low for kernel ovs but if you have a low clockrate cpu, hybrid_plug + incorrect qos
then i could see you hitting such a bottelneck.

one workaround by the way without the os-vif workaround backported is to set
/proc/sys/net/core/default_qdisc to not apply any qos or a low overhead qos type
i.e. sudo sysctl -w net.core.default_qdisc=pfifo_fast

 
that may or may not help but i would ensure that your are not usign somting like fqdel or cake
for net.core.default_qdisc and if you are try changing it to pfifo_fast and see if that helps.

there isnet much you can do about the cpu clock rate but ^ is somethign you can try for free
note it wont actully take effect on an exsitng vm if you jsut change the default but you can use
tc to also chagne the qdisk for testing. hard rebooting the vm shoudl also make the default take effect.

the only other advice i can give assuming kernel ovs is the only option you have is

to look at
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.rx_queue_size
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.tx_queue_size
and
https://docs.openstack.org/nova/latest/configuration/extra-specs.html#hw:vif_multiqueue_enabled

if the bottelneck is actully in qemu or the guest kernel rather then ovs adjusting the rx/tx queue size and
using multi queue can help. it will have no effect if ovs is the bottel neck.



I have set this option to 1024, and enable multiqueue as well. But it did not help.
 
>
> On Wed, Sep 6, 2023 at 9:21 AM Ha Noi <hanoi952022@gmail.com> wrote:
>
> > Hi Satish,
> >
> > Actually, our customer get this issue when the tx/rx above only 40k pps.
> > So what is the threshold of this throughput for OvS?
> >
> >
> > Thanks and regards
> >
> > On Wed, 6 Sep 2023 at 20:19 Satish Patel <satish.txt@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > This is normal because OVS or LinuxBridge wire up VMs using TAP interface
> > > which runs on kernel space and that drives higher interrupt and that makes
> > > the kernel so busy working on handling packets. Standard OVS/LinuxBridge
> > > are not meant for higher PPS.
> > >
> > > If you want to handle higher PPS then look for DPDK or SRIOV deployment.
> > > ( We are running everything in SRIOV because of high PPS requirement)
> > >
> > > On Tue, Sep 5, 2023 at 11:11 AM Ha Noi <hanoi952022@gmail.com> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'm using Openstack Train and Openvswitch for ML2 driver and GRE for
> > > > tunnel type. I tested our network performance between two VMs and suffer
> > > > packet loss as below.
> > > >
> > > > VM1: IP: 10.20.1.206
> > > >
> > > > VM2: IP: 10.20.1.154 <https://10.20.1.154/24>
> > > >
> > > > VM3: IP: 10.20.1.72
> > > >
> > > >
> > > > Using iperf3 to testing performance between VM1 and VM2.
> > > >
> > > > Run iperf3 client and server on both VMs.
> > > >
> > > > On VM2: iperf3 -t 10000 -b 130M -l 442 -P 6 -u -c 10.20.1.206
> > > >
> > > > On VM1: iperf3 -t 10000 -b 130M -l 442 -P 6 -u -c 10.20.1.154
> > > > <https://10.20.1.154/24>
> > > >
> > > >
> > > > Using VM3 ping into VM1, then the packet is lost and the latency is
> > > > quite high.
> > > >
> > > >
> > > > ping -i 0.1 10.20.1.206
> > > >
> > > > PING 10.20.1.206 (10.20.1.206) 56(84) bytes of data.
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=1 ttl=64 time=7.70 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=2 ttl=64 time=6.90 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=3 ttl=64 time=7.71 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=4 ttl=64 time=7.98 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=6 ttl=64 time=8.58 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=7 ttl=64 time=8.34 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=8 ttl=64 time=8.09 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=10 ttl=64 time=4.57 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=11 ttl=64 time=8.74 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=12 ttl=64 time=9.37 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=14 ttl=64 time=9.59 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=15 ttl=64 time=7.97 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=16 ttl=64 time=8.72 ms
> > > >
> > > > 64 bytes from 10.20.1.206: icmp_seq=17 ttl=64 time=9.23 ms
> > > >
> > > > ^C
> > > >
> > > > --- 10.20.1.206 ping statistics ---
> > > >
> > > > 34 packets transmitted, 28 received, 17.6471% packet loss, time 3328ms
> > > >
> > > > rtt min/avg/max/mdev = 1.396/6.266/9.590/2.805 ms
> > > >
> > > >
> > > >
> > > > Does any one get this issue ?
> > > >
> > > > Please help me. Thanks
> > > >
> > >