Re: [nova] Slow nvme performance for local storage instances

15 Aug 2023


      On Mon, 2023-08-14 at 17:29 +0200, Sven Kieske wrote:
...
Hi,
Am Montag, dem 14.08.2023 um 14:37 +0200 schrieb Jan Wasilewski:
...
*[2] fio results of OpenStack managed instance with "vdb" attached:
https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/
<https://paste.openstack.org/show/bViUpJTf7UYpsRyGCAt9/>*
*[3] dumpxml of Libvirt managed instance with "vdb" attached:
https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/
<https://paste.openstack.org/show/bGv8dT1l2QaTiAybYrJi/>*
looking at this xml you attach the qcow file via ide and passthough the nvme dev
directly via virtio-blk

<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' io='native' discard='unmap'/>
      <source file='/var/lib/nova/instances/test/disk' index='1'/>
      <backingStore type='file' index='2'>
        <format type='raw'/>
        <source file='/var/lib/nova/instances/_base/78f03ab8f57b6e53f615f89f7ca212c729cb2f29'/>
        <backingStore/>
      </backingStore>
      <target dev='hda' bus='ide'/>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/nvme1n1p1' index='4'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

that is not a fir comparison as ide will also bottleneck the performance
you shoudl use the same bus for both.
...
...
*[4] fio results of Libvirt managed instance with "vdb" attached:
https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/
<https://paste.openstack.org/show/bOzYXkbco0oDfgaD0co8/>*
*[5] xml configuration of vdb drive:
https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/
<https://paste.openstack.org/show/bAJ9MyEWEGOteeJnH5D8/>*
one difference I can see in the fio results, is that the openstack
provided vm does a lot more context switches and has a different cpu
usage profile in general:
Openstack Instance:
  cpu          : usr=27.16%, sys=62.24%, ctx=3246653, majf=0, minf=14
plain libvirt instance:
  cpu          : usr=15.75%, sys=56.31%, ctx=2860657, majf=0, minf=15
one thing this might be related is the libvirt created vm does not have the
virtual performance monitoring unit enabled (vPMU).
i added the ablity to turn that off a few relases ago 
https://specs.openstack.org/openstack/nova-specs/specs/train/implemented/lib...
via a boolean image metadata key hw_pmu=True|False and a corresponding flavor extra spec hw:pmu=True|False 
so you coudl try disabling that and see if it helps with the context switching.
...
this indicates, that some other workload is running there or work is
scheduled at least in a different way then on the plain libvirt
machine, one example to check might be the irq balancing on different
cores, but I can't remember atm, if this is fixed already on this
kernel release (iirc in the past you used to run the irq-balance daemon
which got obsolete after kernel 4.19 according to
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926967 )
how many other vms are running on that openstack hypervisor?
I hope the hypervisor is not oversubscribed? You can easily see this
in a modern variant of "top" which reports stolen cpu cycles, if you
got cpu steal your cpu is oversubscribed.
depending on the deployment, you will of course also incur additional
overhead from other openstack services - beginning with nova, which
might account for the additional context switches on the hypervisor.
In general 3 million context switches is not that much and should not
impact performance by much, but it's still a noticeable difference
between the two systems.
are the cpu models on the hypervisors exactly the same? I can't tell it
from the libvirt dumps, but I notice that certain cpu flags are
explicitly set for the libvirt managed instance, which might affect the
end result.
What's more bothering is, that the libvirt provided VM
has a total cpu usage of roundabout 70% whereas the openstack provided
one is closer to 90%.
this leads me to believe that either one of the following is true:
- the hypervisor cpus differ in a meaningful way, performance wise.
- the hypervisor is somehow oversubscribed / has more work to do for
the openstack deployed server, which results in worse benchmarks/more
cpu being burnt by constantly evicting the task from the lower level
l1/l2 cpu caches.
- the context switches eat up significant cpu performance on the
openstack instance (least likely imho).
what would be interesting to know would be if mq-deadline and multi
queue are enabled in the plain libvirt machine (are libvirt and qemu
versions the same as in the openstack deploment?).
you can check this like it is described here:
https://bugzilla.redhat.com/show_bug.cgi?id=1827722
But I don't see "num_queues" or "queues" mentioned anywhere, so I
assume it's turned off. Enabling it could also boost your performance
by a lot.
we do not support multi queue for virtio blk or scsi in nova
its on our todo list but not available in any current release.
https://review.opendev.org/c/openstack/nova-specs/+/878066
the person that was propsoign this is nolonger working on openstack so if
peopel are interest feel free to get involved.
otherwise it will liely get enabled in a release or two when we find time to work on it.
...
Another thing to check - especially since I noticed the cpu differences
- would be the numa layout of the hypervisor and how the VM is affected
by it.

Re: [nova] Slow nvme performance for local storage instances

smooney＠redhat.com