On Mon, 2023-08-14 at 17:29 +0200, Sven Kieske wrote:
Hi,
Am Montag, dem 14.08.2023 um 14:37 +0200 schrieb Jan Wasilewski:
*[2] fio results of OpenStack managed instance with "vdb" attached: https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/ <https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/>* *[3] dumpxml of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/ <https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/>*
looking at this xml you attach the qcow file via ide and passthough the nvme dev directly via virtio-blk <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native' discard='unmap'/> <source file='/var/lib/nova/instances/test/disk' index='1'/> <backingStore type='file' index='2'> <format type='raw'/> <source file='/var/lib/nova/instances/_base/78f03ab8f57b6e53f615f89f7ca212c729cb2f29'/> <backingStore/> </backingStore> <target dev='hda' bus='ide'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw'/> <source dev='/dev/nvme1n1p1' index='4'/> <backingStore/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> that is not a fir comparison as ide will also bottleneck the performance you shoudl use the same bus for both.
*[4] fio results of Libvirt managed instance with "vdb" attached: https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/ <https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/>* *[5] xml configuration of vdb drive: https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/ <https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/>*
one difference I can see in the fio results, is that the openstack provided vm does a lot more context switches and has a different cpu usage profile in general:
Openstack Instance:
cpu : usr=27.16%, sys=62.24%, ctx=3246653, majf=0, minf=14
plain libvirt instance:
cpu : usr=15.75%, sys=56.31%, ctx=2860657, majf=0, minf=15
one thing this might be related is the libvirt created vm does not have the virtual performance monitoring unit enabled (vPMU). i added the ablity to turn that off a few relases ago https://specs.openstack.org/openstack/nova-specs/specs/train/implemented/lib... via a boolean image metadata key hw_pmu=True|False and a corresponding flavor extra spec hw:pmu=True|False so you coudl try disabling that and see if it helps with the context switching.
this indicates, that some other workload is running there or work is scheduled at least in a different way then on the plain libvirt machine, one example to check might be the irq balancing on different cores, but I can't remember atm, if this is fixed already on this kernel release (iirc in the past you used to run the irq-balance daemon which got obsolete after kernel 4.19 according to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926967 )
how many other vms are running on that openstack hypervisor?
I hope the hypervisor is not oversubscribed? You can easily see this in a modern variant of "top" which reports stolen cpu cycles, if you got cpu steal your cpu is oversubscribed.
depending on the deployment, you will of course also incur additional overhead from other openstack services - beginning with nova, which might account for the additional context switches on the hypervisor.
In general 3 million context switches is not that much and should not impact performance by much, but it's still a noticeable difference between the two systems.
are the cpu models on the hypervisors exactly the same? I can't tell it from the libvirt dumps, but I notice that certain cpu flags are explicitly set for the libvirt managed instance, which might affect the end result.
What's more bothering is, that the libvirt provided VM has a total cpu usage of roundabout 70% whereas the openstack provided one is closer to 90%.
this leads me to believe that either one of the following is true:
- the hypervisor cpus differ in a meaningful way, performance wise. - the hypervisor is somehow oversubscribed / has more work to do for the openstack deployed server, which results in worse benchmarks/more cpu being burnt by constantly evicting the task from the lower level l1/l2 cpu caches. - the context switches eat up significant cpu performance on the openstack instance (least likely imho).
what would be interesting to know would be if mq-deadline and multi queue are enabled in the plain libvirt machine (are libvirt and qemu versions the same as in the openstack deploment?).
you can check this like it is described here:
https://bugzilla.redhat.com/show_bug.cgi?id=1827722
But I don't see "num_queues" or "queues" mentioned anywhere, so I assume it's turned off. Enabling it could also boost your performance by a lot.
we do not support multi queue for virtio blk or scsi in nova its on our todo list but not available in any current release. https://review.opendev.org/c/openstack/nova-specs/+/878066 the person that was propsoign this is nolonger working on openstack so if peopel are interest feel free to get involved. otherwise it will liely get enabled in a release or two when we find time to work on it.
Another thing to check - especially since I noticed the cpu differences - would be the numa layout of the hypervisor and how the VM is affected by it.