Sean, Let me say "Happy Thanksgiving to you and your family". Thank you for taking time and reply, the last 2 days I was trying to find you on IRC to discuss this issue. Let me explain to you what I did so far. * First i did load-testing on my bare metal compute node to see how far my Trex can go and i found it Hit 2 million packet per second (Not sure if this is good result or not but it prove that i can hit at least 1 million pps) * Then i create sriov VM on that compute node with ( 8vCPU/8GB mem) and i re-run Trex and my max result was 323kpps without dropping packet) I found Intel 82599 nic VF only support 2 queue rx/tx and that could be bottleneck) * Finally i decided to build DPDK vm on it and see how Trex behaved on it and i found it hit max ~400kpps with 4 PMD core. (little better than sriov because now i have 4 rx/tx queue thanks to 4 PMD core) For Trex load-test i did statically assigned ARP entries because its part of Trex process to use static arp. You are saying it should hit 11 million pps but question is what tools you guys using to hit that number i didn't see anyone using Trex for DPDK testing most of people using testpmd. what kind of vm and (vCPU/memory people using to reach 11 million pps?) I am stick to 8 vcpu because majority of my server has 8 core VM size so trying to get most of performance out of it.) If you have your load-test scenario available or tools available then please share some information so i will try to mimic that in my environment. thank you for reply. ~S On Thu, Nov 26, 2020 at 8:14 PM Sean Mooney <smooney@redhat.com> wrote:
Folks,
I am playing with DPDK on my openstack with NIC model 82599 and seeing poor performance, i may be wrong with my numbers so want to see what the community thinks about these results.
Compute node hardware:
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Memory: 64G NIC: Intel 82599 (dual 10G port)
[root@compute-lxb-3 ~]# ovs-vswitchd --version ovs-vswitchd (Open vSwitch) 2.13.2 DPDK 19.11.3
VM dpdk (DUT): 8vCPU / 8GB memory
I have configured my computer node for all best practice available on the internet to get more performance out.
1. Used isolcpus to islocate CPUs 2. 4 dedicated core for PMD 3. echo isolated_cores=1,9,25,33 >> /etc/tuned/cpu-partitioning-variables.conf 4. Huge pages 5. CPU pinning for VM 6. increase ( ovs-vsctl set interface dpdk-1 options:n_rxq=4 ) 7. VM virtio_ring = 1024
After doing all above I am getting the following result using the Trex packet generator using 64B UDP stream (Total-PPS : 391.93 Kpps) Do you think it's an acceptable result or should it be higher on these NIC models?
On Thu, 2020-11-26 at 16:56 -0500, Satish Patel wrote: that is one of inteles oldest generation 10G nics that is supported by dpdk
but it shoudl still get to about 11 million packet per second with 1-2 cores
my guess would be that the vm or trafic gerneator are not sentding and reciving mac learnign frames like arp properly and as a result the packtes are flooding which will severly reduce perfomance.
On the internet folks say it should be a million packets per second so not sure what and how those people reached there or i am missing something in my load test profile.
even kernel ovs will break a million packets persecond so 400Kpps is far to low there is sometin gmisconfigred but im not sure what specificly form what you have shared. as i said my best guess would be that the backets are flooding because the vm is not responding to arp and the normal action is not learn the mac address.
you could rule that out by adding hardcoded rules but you could also check the flow tables to confirm
Notes: I am using 8vCPU core on VM do you think adding more cores will help? OR should i add more PMD?
Cpu Utilization : 2.2 % 1.8 Gb/core Platform_factor : 1.0 Total-Tx : 200.67 Mbps Total-Rx : 200.67 Mbps Total-PPS : 391.93 Kpps Total-CPS : 391.89 Kcps
Expected-PPS : 700.00 Kpps Expected-CPS : 700.00 Kcps Expected-BPS : 358.40 Mbps
This is my all configuration:
grub.conf: GRUB_CMDLINE_LINUX="vmalloc=384M crashkernel=auto rd.lvm.lv=rootvg01/lv01 console=ttyS1,118200 rhgb quiet intel_iommu=on iommu=pt spectre_v2=off nopti pti=off nospec_store_bypass_disable spec_store_bypass_disable=off l1tf=off default_hugepagesz=1GB hugepagesz=1G hugepages=60 transparent_hugepage=never selinux=0 isolcpus=2,3,4,5,6,7,10,11,12,13,14,15,26,27,28,29,30,31,34,35,36,37,38,39"
[root@compute-lxb-3 ~]# ovs-appctl dpif/show netdev@ovs-netdev: hit:605860720 missed:2129 br-int: br-int 65534/3: (tap) int-br-vlan 1/none: (patch: peer=phy-br-vlan) patch-tun 2/none: (patch: peer=patch-int) vhu1d64ea7d-d9 5/6: (dpdkvhostuserclient: configured_rx_queues=8, configured_tx_queues=8, mtu=1500, requested_rx_queues=8, requested_tx_queues=8) vhu9c32faf6-ac 6/7: (dpdkvhostuserclient: configured_rx_queues=8, configured_tx_queues=8, mtu=1500, requested_rx_queues=8, requested_tx_queues=8) br-tun: br-tun 65534/4: (tap) patch-int 1/none: (patch: peer=patch-tun) vxlan-0a410071 2/5: (vxlan: egress_pkt_mark=0, key=flow, local_ip=10.65.0.114, remote_ip=10.65.0.113) br-vlan: br-vlan 65534/1: (tap) dpdk-1 2/2: (dpdk: configured_rx_queues=4, configured_rxq_descriptors=2048, configured_tx_queues=5, configured_txq_descriptors=2048, lsc_interrupt_mode=false, mtu=1500, requested_rx_queues=4, requested_rxq_descriptors=2048, requested_tx_queues=5, requested_txq_descriptors=2048, rx_csum_offload=true, tx_tso_offload=false) phy-br-vlan 1/none: (patch: peer=int-br-vlan)
[root@compute-lxb-3 ~]# ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 1: isolated : false port: dpdk-1 queue-id: 0 (enabled) pmd usage: 0 % port: vhu1d64ea7d-d9 queue-id: 3 (enabled) pmd usage: 0 % port: vhu1d64ea7d-d9 queue-id: 4 (enabled) pmd usage: 0 % port: vhu9c32faf6-ac queue-id: 3 (enabled) pmd usage: 0 % port: vhu9c32faf6-ac queue-id: 4 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 9: isolated : false port: dpdk-1 queue-id: 1 (enabled) pmd usage: 0 % port: vhu1d64ea7d-d9 queue-id: 2 (enabled) pmd usage: 0 % port: vhu1d64ea7d-d9 queue-id: 5 (enabled) pmd usage: 0 % port: vhu9c32faf6-ac queue-id: 2 (enabled) pmd usage: 0 % port: vhu9c32faf6-ac queue-id: 5 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 25: isolated : false port: dpdk-1 queue-id: 3 (enabled) pmd usage: 0 % port: vhu1d64ea7d-d9 queue-id: 0 (enabled) pmd usage: 0 % port: vhu1d64ea7d-d9 queue-id: 7 (enabled) pmd usage: 0 % port: vhu9c32faf6-ac queue-id: 0 (enabled) pmd usage: 0 % port: vhu9c32faf6-ac queue-id: 7 (enabled) pmd usage: 0 % pmd thread numa_id 0 core_id 33: isolated : false port: dpdk-1 queue-id: 2 (enabled) pmd usage: 0 % port: vhu1d64ea7d-d9 queue-id: 1 (enabled) pmd usage: 0 % port: vhu1d64ea7d-d9 queue-id: 6 (enabled) pmd usage: 0 % port: vhu9c32faf6-ac queue-id: 1 (enabled) pmd usage: 0 % port: vhu9c32faf6-ac queue-id: 6 (enabled) pmd usage: 0 %