[octavia] [lbaas] CPU pinning & multiqueue
Hi, openstack team, is someone tested CPU pinning together with enabled multiqueue for VM? I'm seeing huge performance degradation (750k RPS => 200k RPS) on flavor with 16 vCPUs and 16 gb ram when using image built with CPU pinning vs image without it, we're using zed (I've backported commits to check, but I think newer versions can also be affected), haproxy 2.4, ubuntu 22.04.
Hi, you're talking about CPU pinning in the amphora VM, right? or on the hypervisor? IMHO it doesn't make sense to enable both CPU pinning _in_ the amphora and multiqueue. If I'm not mistaken, CPU pinning in the amphora dedicates n-1 vCPUs to haproxy, while the remaining vCPU is used by the IOs and the system. so enabling multiqueue doesn't help because the queues will be used on 1 vCPU. Greg On Mon, Nov 18, 2024 at 3:07 AM <yardalgedal@gmail.com> wrote:
Hi, openstack team, is someone tested CPU pinning together with enabled multiqueue for VM?
I'm seeing huge performance degradation (750k RPS => 200k RPS) on flavor with 16 vCPUs and 16 gb ram when using image built with CPU pinning vs image without it, we're using zed (I've backported commits to check, but I think newer versions can also be affected), haproxy 2.4, ubuntu 22.04.
Yes, inside the amphora. I just think it can work for amphora multiqueue, but it should in another way ... so moving interrupts to some other CPU in case of multiqueue is not advisable, BUT having haproxy nbproc + mapping haproxy procs for cores and (maybe, in addition) cores to queues – can still be good ...
Also to clarify, nbproc has been deprecated[1] and should not be used. It had many limitations which is why Octavia has never supported it. The modern threading models in HAProxy as used by Octavia have significant performance advantages. Michael [1] https://www.haproxy.com/blog/multithreading-in-haproxy On Mon, Nov 18, 2024 at 3:46 AM <yardalgedal@gmail.com> wrote:
Yes, inside the amphora.
I just think it can work for amphora multiqueue, but it should in another way ... so moving interrupts to some other CPU in case of multiqueue is not advisable, BUT having haproxy nbproc + mapping haproxy procs for cores and (maybe, in addition) cores to queues – can still be good ...
On 18/11/2024 07:25, Gregory Thiemonge wrote:
Hi,
you're talking about CPU pinning in the amphora VM, right? or on the hypervisor? IMHO it doesn't make sense to enable both CPU pinning _in_ the amphora and multiqueue.
why would you say that? multi quueu was added to nova exipclitly for NFV workload like OCTAVIA so that if your network backend supproted multipel queue (ovs-dpdk) multiple host cores could be used to service the vm traffic.
If I'm not mistaken, CPU pinning in the amphora dedicates n-1 vCPUs to haproxy, while the remaining vCPU is used by the IOs and the system. so enabling multiqueue doesn't help because the queues will be used on 1 vCPU.
thats not really how it works multi queue enables the guest to and host to allocate 1 queue per logical core. i.e. if the guest has 10 vcpus and only 1 queue is enabled then all 10 logical geust cores will content to read/write packets to it. enabling multi queue on the guest allow the guest to map confirue the workload to use 1 queue pre worker thread removing the contention on the queue. this will only improve performance if the host networking stack can scale to service each of the queue. when using kernel vhost (kernel ovs) a separate kernel vhost thread i spawned for each queue allowing higher throughput. when using ovs-dpdk each virtio queue can be scheduled to a different dpdk PMD to service the traffic. without multi queue at most one pmd/vhost kernel thread will service the virtual network interface traffic. note that on older kernels you actuly have to us ethtool to enabel multiqueue on the guest interface form the guest OS or it will reduce performance.
Greg
On Mon, Nov 18, 2024 at 3:07 AM <yardalgedal@gmail.com> wrote:
Hi, openstack team, is someone tested CPU pinning together with enabled multiqueue for VM?
I'm seeing huge performance degradation (750k RPS => 200k RPS) on flavor with 16 vCPUs and 16 gb ram when using image built with CPU pinning vs image without it, we're using zed (I've backported commits to check, but I think newer versions can also be affected), haproxy 2.4, ubuntu 22.04.
participants (4)
-
Gregory Thiemonge
-
Michael Johnson
-
Sean Mooney
-
yardalgedal@gmail.com