<div dir="auto">Thank for your email</div><div dir="auto">We are not using encryption volume.</div><div dir="auto"><br></div><div dir="auto">If this is a bug of librados, i do not see any effect of throughput when VM using volume SSD.</div><div dir="auto">And the performance of ceph HDD mounted directly from compute still good.</div><div dir="auto"><br></div><div dir="auto">We already disable debug in ceph.conf</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 7 Jul 2022 at 19:26 Sean Mooney <<a href="mailto:smooney@redhat.com">smooney@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">On Thu, 2022-07-07 at 12:06 +0200, Gorka Eguileor wrote:<br>

> On 07/07, Vinh Nguyen Duc wrote:<br>

> > I have a problem with I/O performance on Openstack block device HDD.<br>

> > <br>

> > *Environment:**Openstack version: Ussuri*<br>

> > - OS: CentOS8<br>

> > - Kernel: 4.18.0-240.15.1.el8_3.x86_64<br>

> > - KVM: qemu-kvm-5.1.0-20.el8<br>

> > *CEPH version: Octopus * *15.2.8-0.el8.x84_64*<br>

> > - OS: CentOS8<br>

> > - Kernel: 4.18.0-240.15.1.el8_3.x86_64<br>

> > In CEPH Cluster we have 2 class:<br>

> > - Bluestore<br>

> > - HDD (only for cinder volume)<br>

> > - SSD (images, cinder volume)<br>

> > *Hardware:*<br>

> > - Ceph-client: 2x10Gbps (bond) MTU 9000<br>

> > - Ceph-replicate: 2x10Gbps (bond) MTU 9000<br>

> > *VM:*<br>

> > - Swapoff<br>

> > - non LVM<br>

> > <br>

> > *Issue*When create VM on Openstack using cinder volume HDD, have really<br>

> > poor performance: 60-85 MB/s writes. And when tests with ioping have high<br>

> > latency.<br>

> > *Diagnostic*<br>

> > 1.  I have checked the performance between Compute Host (Openstack) and<br>

> > CEPH, and created an RBD (HDD class) mounted on Compute Host. And the<br>

> > performance is 300-400 MB/s.<br>

> <br>

> Hi,<br>

> <br>

> I probably won't be able to help you on the hypervisor side, but I have<br>

> a couple of questions that may help narrow down the issue:<br>

> <br>

> - Are Cinder volumes using encryption?<br>

if you are not using encyrption you might be encountering librados issue<br>

tracked downstream by <a href="https://bugzilla.redhat.com/show_bug.cgi?id=1897572" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/show_bug.cgi?id=1897572</a><br>

this is unfixable without moving to updating to a new version fo the cpeh client<br>

libs. <br>

> <br>

> - How did you connect the volume to the Compute Host, using krbd or<br>

>   rbd-nbd?<br>

in ussuri we still technially have the workaround options to use krbd but they are<br>

deprecated and removed in xena.<br>

<a href="https://github.com/openstack/nova/blob/stable/ussuri/nova/conf/workarounds.py#L277-L329=" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/stable/ussuri/nova/conf/workarounds.py#L277-L329=</a><br>

in generaly using these options might invlaidate any support agreement you may have with a vendeor.<br>

<br>

we are aware of at least once edgecase currently where enableing this with encyrpted volume breaks live<br>

migration potentally causeing dataloss. <a href="https://bugs.launchpad.net/nova/+bug/1939545" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1939545</a><br>

there is a backport inflight for the fix to train <a href="https://review.opendev.org/q/topic:bug%252F1939545" rel="noreferrer" target="_blank">https://review.opendev.org/q/topic:bug%252F1939545</a><br>

but its only been backported to wallaby so far so it is not safe to enable those options and use live migration<br>

today.<br>

<br>

you should also be aware that to enabel this optionon a host you need to drain the host first then enable the option adn cold<br>

migrate instance to the host. live migration betwen hosts with local attach enabeld and disabled is not supported.<br>

<br>

if you want to disable it again in the futrue which you will have to do to upgrade to xena you need to cold migrate all instances<br>

again.<br>

<br>

so if you are deploying your own version fo cpeh and can move to a newer version which has the librados perforamce enhacment feature<br>

that is operationlly less painful then using these workaround.<br>

<br>

the only reason we developed this workaroudn to use krbd in nova was because our hands were tieed downstream since we could not ship a new version of<br>

ceph but needed to support release with this perfromance limiation for multiple years. so unless your in a simialr situration<br>

upgradeing ceph and ensuring you use the new versions of the ceph libs with qemu and a new enough qemu to leverave the performance enhancments is the<br>

best option.<br>

<br>

so with those disclaimer you may want to consider evaluating those workaround options but keep in mind the limiatation and the fact that you cannot<br>

live migrate until that bug is fixt before considering using it in production.<br>

<br>

> <br>

> - Do both RBD images (Cinder and yours) have the same Ceph flags?<br>

> <br>

> - Did you try connecting to the Compute Host the same RBD image created<br>

>   by Cinder instead of creating a new one?<br>

> <br>

> Cheers,<br>

> Gorka.<br>

> <br>

> > =>  So i think the problem is in the hypervisor<br>

> > But when I check performance on VM using cinder Volume SSD, the result<br>

> > equals performance when test RBD (SSD) mounted on a Compute host.<br>

> > 2.  I already have to configure disk_cachemodes="network=writeback"(and<br>

> > enable rbd cache client) or test with disk_cachemodes="none" but nothing<br>

> > different.<br>

> > 3.  Push iperf3 from compute host to random ceph host still has 20Gb<br>

> > traffic.<br>

> > 4.  Compute Host and CEPH host connected to the same switch (layer2).<br>

> > Where else can I look for issues?<br>

> > Please help me in this case.<br>

> > Thank you.<br>

> <br>

> <br>

<br>

</blockquote></div></div>