<div dir="ltr">Super interesting. Thank you.<div><br></div><div>Pretty obvious with the random IO/throughput performance degradation :(</div><div><br></div><div>Are these NVME/SSD in hardware raid?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jan 6, 2022 at 10:54 PM Eric K. Miller <<a href="mailto:emiller@genesishosting.com">emiller@genesishosting.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Laurent,<br>

<br>

I thought I may have already done some benchmarks, and it looks like I did, long ago, for the discussion that I created a couple years ago (on August 6, 2020 to be exact).<br>

<br>

I copied the results from that email below.  You can see that the latency difference is pretty significant (13.75x with random 4KiB reads) between bare metal and a VM, which is about the same as the difference in IOPS.  Writes are not quite as bad of difference at 8.4x.<br>

<br>

Eric<br>

<br>

<br>

Some numbers from fio, just to get an idea for how good/bad the IOPS will be:<br>

<br>

Configuration:<br>

32 core EPYC 7502P with 512GiB of RAM - CentOS 7 latest updates - Kolla Ansible (Stein) deployment<br>

32 vCPU VM with 64GiB of RAM<br>

32 x 10GiB test files (I'm using file tests, not raw device tests, so not optimal, but easiest when the VM root disk is the test disk)<br>

iodepth=10<br>

numofjobs=32<br>

time=30 (seconds)<br>

<br>

The VM was deployed using a qcow2 image, then deployed as a raw image, to see the difference in performance.  There was none, which makes sense, since I'm pretty sure the qcow2 image was decompressed and stored in the LVM logical volume - so both tests were measuring the same thing.<br>

<br>

Bare metal (random 4KiB reads):<br>

8066MiB/sec<br>

154.34 microsecond avg latency<br>

2.065 million IOPS<br>

<br>

VM qcow2 (random 4KiB reads):<br>

589MiB/sec<br>

2122.10 microsecond avg latency<br>

151k IOPS<br>

<br>

Bare metal (random 4KiB writes):<br>

4940MiB/sec<br>

252.44 microsecond avg latency<br>

1.265 million IOPS<br>

<br>

VM qcow2 (random 4KiB writes):<br>

589MiB/sec<br>

2119.16 microsecond avg latency<br>

151k IOPS<br>

<br>

Since the read and write VM results are nearly identical, my assumption is that the emulation layer is the bottleneck.  CPUs in the VM were all at 55% utilization (all kernel usage).  The qemu process on the bare metal machine indicated 1600% (or so) CPU utilization.<br>

<br>

Below are runs with sequential 1MiB block tests<br>

<br>

Bare metal (sequential 1MiB reads):<br>

13.3GiB/sec<br>

23446.43 microsecond avg latency<br>

13.7k IOPS<br>

<br>

VM qcow2 (sequential 1MiB reads):<br>

8378MiB/sec<br>

38164.52 microsecond avg latency<br>

8377 IOPS<br>

<br>

Bare metal (sequential 1MiB writes):<br>

8098MiB/sec<br>

39488.00 microsecond avg latency<br>

8097 million IOPS<br>

<br>

VM qcow2 (sequential 1MiB writes):<br>

8087MiB/sec<br>

39534.96 microsecond avg latency<br>

8087 IOPS<br>

</blockquote></div>