Super interesting. Thank you. Pretty obvious with the random IO/throughput performance degradation :( Are these NVME/SSD in hardware raid? On Thu, Jan 6, 2022 at 10:54 PM Eric K. Miller <emiller@genesishosting.com> wrote:
Hi Laurent,
I thought I may have already done some benchmarks, and it looks like I did, long ago, for the discussion that I created a couple years ago (on August 6, 2020 to be exact).
I copied the results from that email below. You can see that the latency difference is pretty significant (13.75x with random 4KiB reads) between bare metal and a VM, which is about the same as the difference in IOPS. Writes are not quite as bad of difference at 8.4x.
Eric
Some numbers from fio, just to get an idea for how good/bad the IOPS will be:
Configuration: 32 core EPYC 7502P with 512GiB of RAM - CentOS 7 latest updates - Kolla Ansible (Stein) deployment 32 vCPU VM with 64GiB of RAM 32 x 10GiB test files (I'm using file tests, not raw device tests, so not optimal, but easiest when the VM root disk is the test disk) iodepth=10 numofjobs=32 time=30 (seconds)
The VM was deployed using a qcow2 image, then deployed as a raw image, to see the difference in performance. There was none, which makes sense, since I'm pretty sure the qcow2 image was decompressed and stored in the LVM logical volume - so both tests were measuring the same thing.
Bare metal (random 4KiB reads): 8066MiB/sec 154.34 microsecond avg latency 2.065 million IOPS
VM qcow2 (random 4KiB reads): 589MiB/sec 2122.10 microsecond avg latency 151k IOPS
Bare metal (random 4KiB writes): 4940MiB/sec 252.44 microsecond avg latency 1.265 million IOPS
VM qcow2 (random 4KiB writes): 589MiB/sec 2119.16 microsecond avg latency 151k IOPS
Since the read and write VM results are nearly identical, my assumption is that the emulation layer is the bottleneck. CPUs in the VM were all at 55% utilization (all kernel usage). The qemu process on the bare metal machine indicated 1600% (or so) CPU utilization.
Below are runs with sequential 1MiB block tests
Bare metal (sequential 1MiB reads): 13.3GiB/sec 23446.43 microsecond avg latency 13.7k IOPS
VM qcow2 (sequential 1MiB reads): 8378MiB/sec 38164.52 microsecond avg latency 8377 IOPS
Bare metal (sequential 1MiB writes): 8098MiB/sec 39488.00 microsecond avg latency 8097 million IOPS
VM qcow2 (sequential 1MiB writes): 8087MiB/sec 39534.96 microsecond avg latency 8087 IOPS