I am configuring a high performance storage vms, I decided to go to the easy path (pci-passthrough), I can spin up vms and see the pci devices, however performance
is below native/bare metal.
Native/Bare metal performance:
[root@zeus-54 data]# fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=39.5MiB/s,w=39.6MiB/s][r=10.1k,w=10.1k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=50892: Wed May 1 22:22:45 2019
read: IOPS=9805, BW=38.3MiB/s (40.2MB/s)(4596MiB/120001msec)
slat (usec): min=39, max=6678, avg=94.72, stdev=55.78
clat (nsec): min=450, max=18224, avg=525.83, stdev=120.10
lat (usec): min=39, max=6679, avg=95.36, stdev=55.79
clat percentiles (nsec):
| 1.00th=[ 462], 5.00th=[ 478], 10.00th=[ 482], 20.00th=[ 486],
| 30.00th=[ 490], 40.00th=[ 494], 50.00th=[ 502], 60.00th=[ 510],
| 70.00th=[ 516], 80.00th=[ 532], 90.00th=[ 596], 95.00th=[ 676],
| 99.00th=[ 860], 99.50th=[ 1048], 99.90th=[ 1384], 99.95th=[ 2480],
| 99.99th=[ 3728]
bw ( KiB/s): min= 720, max=40736, per=100.00%, avg=39389.00, stdev=5317.58, samples=239
iops : min= 180, max=10184, avg=9847.23, stdev=1329.39, samples=239
write: IOPS=9799, BW=38.3MiB/s (40.1MB/s)(4594MiB/120001msec)
slat (nsec): min=2982, max=106207, avg=4220.09, stdev=980.04
clat (nsec): min=407, max=18130, avg=451.48, stdev=103.71
lat (usec): min=3, max=111, avg= 4.74, stdev= 1.03
clat percentiles (nsec):
| 1.00th=[ 414], 5.00th=[ 418], 10.00th=[ 422], 20.00th=[ 430],
| 30.00th=[ 434], 40.00th=[ 434], 50.00th=[ 438], 60.00th=[ 438],
| 70.00th=[ 442], 80.00th=[ 446], 90.00th=[ 462], 95.00th=[ 588],
| 99.00th=[ 700], 99.50th=[ 916], 99.90th=[ 1208], 99.95th=[ 1288],
| 99.99th=[ 3536]
bw ( KiB/s): min= 752, max=42608, per=100.00%, avg=39366.63, stdev=5355.73, samples=239
iops : min= 188, max=10652, avg=9841.64, stdev=1338.93, samples=239
lat (nsec) : 500=69.98%, 750=28.64%, 1000=0.90%
lat (usec) : 2=0.42%, 4=0.04%, 10=0.01%, 20=0.01%
cpu : usr=2.20%, sys=10.85%, ctx=1176675, majf=0, minf=1372
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=1176625,1175958,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=38.3MiB/s (40.2MB/s), 38.3MiB/s-38.3MiB/s (40.2MB/s-40.2MB/s), io=4596MiB (4819MB), run=120001-120001msec
WRITE: bw=38.3MiB/s (40.1MB/s), 38.3MiB/s-38.3MiB/s (40.1MB/s-40.1MB/s), io=4594MiB (4817MB), run=120001-120001msec
Disk stats (read/write):
nvme9n1: ios=1174695/883620, merge=0/0, ticks=105502/72225, in_queue=192101, util=99.28%
VM performance:
[centos@kudu-1 nvme0]$ sudo fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=29.2MiB/s,w=29.7MiB/s][r=7487,w=7595 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=44383: Wed May 1 12:22:24 2019
read: IOPS=6994, BW=27.3MiB/s (28.6MB/s)(3278MiB/120000msec)
slat (usec): min=54, max=20476, avg=115.27, stdev=71.45
clat (nsec): min=1757, max=31476, avg=2163.02, stdev=688.66
lat (usec): min=56, max=20481, avg=118.51, stdev=71.66
clat percentiles (nsec):
| 1.00th=[ 1800], 5.00th=[ 1832], 10.00th=[ 1864], 20.00th=[ 1992],
| 30.00th=[ 2040], 40.00th=[ 2064], 50.00th=[ 2064], 60.00th=[ 2096],
| 70.00th=[ 2096], 80.00th=[ 2128], 90.00th=[ 2480], 95.00th=[ 2544],
| 99.00th=[ 4448], 99.50th=[ 5536], 99.90th=[11072], 99.95th=[12736],
| 99.99th=[18560]
bw ( KiB/s): min= 952, max=31224, per=100.00%, avg=28153.51, stdev=4126.89, samples=237
iops : min= 238, max= 7806, avg=7038.23, stdev=1031.70, samples=237
write: IOPS=6985, BW=27.3MiB/s (28.6MB/s)(3274MiB/120000msec)
slat (usec): min=7, max=963, avg=12.60, stdev= 6.24
clat (nsec): min=1662, max=199250, avg=2030.26, stdev=712.33
lat (usec): min=10, max=970, avg=15.68, stdev= 6.48
clat percentiles (nsec):
| 1.00th=[ 1688], 5.00th=[ 1720], 10.00th=[ 1736], 20.00th=[ 1864],
| 30.00th=[ 1928], 40.00th=[ 1944], 50.00th=[ 1944], 60.00th=[ 1960],
| 70.00th=[ 1960], 80.00th=[ 1992], 90.00th=[ 2352], 95.00th=[ 2384],
| 99.00th=[ 4048], 99.50th=[ 4768], 99.90th=[11456], 99.95th=[13120],
| 99.99th=[19072]
bw ( KiB/s): min= 912, max=31880, per=100.00%, avg=28119.64, stdev=4176.38, samples=237
iops : min= 228, max= 7970, avg=7029.75, stdev=1044.07, samples=237
lat (usec) : 2=51.56%, 4=47.17%, 10=1.03%, 20=0.22%, 50=0.01%
lat (usec) : 250=0.01%
cpu : usr=4.96%, sys=28.37%, ctx=839307, majf=0, minf=26
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=839283,838268,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3278MiB (3438MB), run=120000-120000msec
WRITE: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3274MiB (3434MB), run=120000-120000msec
Disk stats (read/write):
nvme0n1: ios=838322/651596, merge=0/0, ticks=83804/22119, in_queue=104773, util=70.18%
This is my Openstack rocky configuration:
nova.conf on controller node
[pci]
alias = { "vendor_id":"10de", "product_id":"1db1", "device_type":"type-PCI", "name":"nv_v100" }
alias = { "vendor_id":"8086", "product_id":"0953", "device_type":"type-PCI", "name":"nvme"}
nova.conf on compute node:
[pci]
passthrough_whitelist = [ {"address":"0000:84:00.0"}, {"address":"0000:85:00.0"}, {"address":"0000:86:00.0"}, {"address":"0000:87:00.0"} ]
alias = { "vendor_id":"8086", "product_id":"0953", "device_type":"type-PCI", "name":"nvme"}
This is how the nvmes are exposed to the vm
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x86' slot='0x00' function='0x0'/>
</source>
<alias name='hostdev1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</hostdev>
Guest OS is centos 7.6 so I am guessing nvme drivers are included.
Any help about what needs to my configuration to get close to native io performance?
Thank you very much
Manuel
From: Manuel Sopena Ballesteros [mailto:manuel.sb@garvan.org.au]
Sent: Wednesday, May 1, 2019 10:31 PM
To: openstack-discuss@lists.openstack.org
Subject: how to get best io performance from my block devices
Dear Openstack community,
I would like to have a high performance distributed database running in Openstack vms. I tried attaching dedicated nvme pci devices to the vm but the performance
is not as good as I can get from bare metal.
Bare metal:
[root@zeus-54 data]# fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=39.5MiB/s,w=39.6MiB/s][r=10.1k,w=10.1k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=50892: Wed May
1 22:22:45 2019
read: IOPS=9805, BW=38.3MiB/s (40.2MB/s)(4596MiB/120001msec)
slat (usec): min=39, max=6678, avg=94.72, stdev=55.78
clat (nsec): min=450, max=18224, avg=525.83, stdev=120.10
lat (usec): min=39, max=6679, avg=95.36, stdev=55.79
clat percentiles (nsec):
|
1.00th=[
462], 5.00th=[
478], 10.00th=[
482], 20.00th=[
486],
| 30.00th=[
490], 40.00th=[
494], 50.00th=[
502], 60.00th=[
510],
| 70.00th=[
516], 80.00th=[
532], 90.00th=[
596], 95.00th=[
676],
| 99.00th=[
860], 99.50th=[ 1048], 99.90th=[ 1384], 99.95th=[ 2480],
| 99.99th=[ 3728]
bw (
KiB/s): min=
720, max=40736, per=100.00%, avg=39389.00, stdev=5317.58, samples=239
iops
: min=
180, max=10184, avg=9847.23, stdev=1329.39, samples=239
write: IOPS=9799, BW=38.3MiB/s (40.1MB/s)(4594MiB/120001msec)
slat (nsec): min=2982, max=106207, avg=4220.09, stdev=980.04
clat (nsec): min=407, max=18130, avg=451.48, stdev=103.71
lat (usec): min=3, max=111, avg= 4.74, stdev= 1.03
clat percentiles (nsec):
|
1.00th=[
414], 5.00th=[
418], 10.00th=[
422], 20.00th=[
430],
| 30.00th=[
434], 40.00th=[
434], 50.00th=[
438], 60.00th=[
438],
| 70.00th=[
442], 80.00th=[
446], 90.00th=[
462], 95.00th=[
588],
| 99.00th=[
700], 99.50th=[
916], 99.90th=[ 1208], 99.95th=[ 1288],
| 99.99th=[ 3536]
bw (
KiB/s): min=
752, max=42608, per=100.00%, avg=39366.63, stdev=5355.73, samples=239
iops
: min=
188, max=10652, avg=9841.64, stdev=1338.93, samples=239
lat (nsec)
: 500=69.98%, 750=28.64%, 1000=0.90%
lat (usec)
: 2=0.42%, 4=0.04%, 10=0.01%, 20=0.01%
cpu
: usr=2.20%, sys=10.85%, ctx=1176675, majf=0, minf=1372
IO depths
: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit
: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete
: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=1176625,1175958,0, short=0,0,0, dropped=0,0,0
latency
: target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=38.3MiB/s (40.2MB/s), 38.3MiB/s-38.3MiB/s (40.2MB/s-40.2MB/s), io=4596MiB (4819MB), run=120001-120001msec
WRITE: bw=38.3MiB/s (40.1MB/s), 38.3MiB/s-38.3MiB/s (40.1MB/s-40.1MB/s), io=4594MiB (4817MB), run=120001-120001msec
Disk stats (read/write):
nvme9n1: ios=1174695/883620, merge=0/0, ticks=105502/72225, in_queue=192101, util=99.28%
From vm:
[centos@kudu-1 nvme0]$ sudo fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=29.2MiB/s,w=29.7MiB/s][r=7487,w=7595 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=44383: Wed May
1 12:22:24 2019
read: IOPS=6994, BW=27.3MiB/s (28.6MB/s)(3278MiB/120000msec)
slat (usec): min=54, max=20476, avg=115.27, stdev=71.45
clat (nsec): min=1757, max=31476, avg=2163.02, stdev=688.66
lat (usec): min=56, max=20481, avg=118.51, stdev=71.66
clat percentiles (nsec):
|
1.00th=[ 1800],
5.00th=[ 1832], 10.00th=[ 1864], 20.00th=[ 1992],
| 30.00th=[ 2040], 40.00th=[ 2064], 50.00th=[ 2064], 60.00th=[ 2096],
| 70.00th=[ 2096], 80.00th=[ 2128], 90.00th=[ 2480], 95.00th=[ 2544],
| 99.00th=[ 4448], 99.50th=[ 5536], 99.90th=[11072], 99.95th=[12736],
| 99.99th=[18560]
bw (
KiB/s): min=
952, max=31224, per=100.00%, avg=28153.51, stdev=4126.89, samples=237
iops
: min=
238, max= 7806, avg=7038.23, stdev=1031.70, samples=237
write: IOPS=6985, BW=27.3MiB/s (28.6MB/s)(3274MiB/120000msec)
slat (usec): min=7, max=963, avg=12.60, stdev= 6.24
clat (nsec): min=1662, max=199250, avg=2030.26, stdev=712.33
lat (usec): min=10, max=970, avg=15.68, stdev= 6.48
clat percentiles (nsec):
|
1.00th=[ 1688],
5.00th=[ 1720], 10.00th=[ 1736], 20.00th=[ 1864],
| 30.00th=[ 1928], 40.00th=[ 1944], 50.00th=[ 1944], 60.00th=[ 1960],
| 70.00th=[ 1960], 80.00th=[ 1992], 90.00th=[ 2352], 95.00th=[ 2384],
| 99.00th=[ 4048], 99.50th=[ 4768], 99.90th=[11456], 99.95th=[13120],
| 99.99th=[19072]
bw (
KiB/s): min=
912, max=31880, per=100.00%, avg=28119.64, stdev=4176.38, samples=237
iops
: min=
228, max= 7970, avg=7029.75, stdev=1044.07, samples=237
lat (usec)
: 2=51.56%, 4=47.17%, 10=1.03%, 20=0.22%, 50=0.01%
lat (usec)
: 250=0.01%
cpu
: usr=4.96%, sys=28.37%, ctx=839307, majf=0, minf=26
IO depths
: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit
: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete
: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=839283,838268,0, short=0,0,0, dropped=0,0,0
latency
: target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3278MiB (3438MB), run=120000-120000msec
WRITE: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3274MiB (3434MB), run=120000-120000msec
Disk stats (read/write):
nvme0n1: ios=838322/651596, merge=0/0, ticks=83804/22119, in_queue=104773, util=70.18%
Is there a way I can get near bare metal performance from my nvme block devices?
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright
information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability
for the distribution of viruses or similar in electronic communications. This notice should not be removed.