Poor I/O performance on OpenStack block device (OpenStack Centos8:Ussuri)

Gorka Eguileor geguileo at redhat.com
Thu Jul 7 12:47:13 UTC 2022


On 07/07, Vinh Nguyen Duc wrote:
> Thank for your email
> We are not using encryption volume.
>
> If this is a bug of librados, i do not see any effect of throughput when VM
> using volume SSD.
> And the performance of ceph HDD mounted directly from compute still good.
>

Hi,

Did the Cinder volume that was performing poorly in the VM perform well
when manually connected directly to the Compute Host?

Cheers,
Gorka.


> We already disable debug in ceph.conf
>
> On Thu, 7 Jul 2022 at 19:26 Sean Mooney <smooney at redhat.com> wrote:
>
> > On Thu, 2022-07-07 at 12:06 +0200, Gorka Eguileor wrote:
> > > On 07/07, Vinh Nguyen Duc wrote:
> > > > I have a problem with I/O performance on Openstack block device HDD.
> > > >
> > > > *Environment:**Openstack version: Ussuri*
> > > > - OS: CentOS8
> > > > - Kernel: 4.18.0-240.15.1.el8_3.x86_64
> > > > - KVM: qemu-kvm-5.1.0-20.el8
> > > > *CEPH version: Octopus * *15.2.8-0.el8.x84_64*
> > > > - OS: CentOS8
> > > > - Kernel: 4.18.0-240.15.1.el8_3.x86_64
> > > > In CEPH Cluster we have 2 class:
> > > > - Bluestore
> > > > - HDD (only for cinder volume)
> > > > - SSD (images, cinder volume)
> > > > *Hardware:*
> > > > - Ceph-client: 2x10Gbps (bond) MTU 9000
> > > > - Ceph-replicate: 2x10Gbps (bond) MTU 9000
> > > > *VM:*
> > > > - Swapoff
> > > > - non LVM
> > > >
> > > > *Issue*When create VM on Openstack using cinder volume HDD, have really
> > > > poor performance: 60-85 MB/s writes. And when tests with ioping have
> > high
> > > > latency.
> > > > *Diagnostic*
> > > > 1.  I have checked the performance between Compute Host (Openstack) and
> > > > CEPH, and created an RBD (HDD class) mounted on Compute Host. And the
> > > > performance is 300-400 MB/s.
> > >
> > > Hi,
> > >
> > > I probably won't be able to help you on the hypervisor side, but I have
> > > a couple of questions that may help narrow down the issue:
> > >
> > > - Are Cinder volumes using encryption?
> > if you are not using encyrption you might be encountering librados issue
> > tracked downstream by https://bugzilla.redhat.com/show_bug.cgi?id=1897572
> > this is unfixable without moving to updating to a new version fo the cpeh
> > client
> > libs.
> > >
> > > - How did you connect the volume to the Compute Host, using krbd or
> > >   rbd-nbd?
> > in ussuri we still technially have the workaround options to use krbd but
> > they are
> > deprecated and removed in xena.
> >
> > https://github.com/openstack/nova/blob/stable/ussuri/nova/conf/workarounds.py#L277-L329=
> > in generaly using these options might invlaidate any support agreement you
> > may have with a vendeor.
> >
> > we are aware of at least once edgecase currently where enableing this with
> > encyrpted volume breaks live
> > migration potentally causeing dataloss.
> > https://bugs.launchpad.net/nova/+bug/1939545
> > there is a backport inflight for the fix to train
> > https://review.opendev.org/q/topic:bug%252F1939545
> > but its only been backported to wallaby so far so it is not safe to enable
> > those options and use live migration
> > today.
> >
> > you should also be aware that to enabel this optionon a host you need to
> > drain the host first then enable the option adn cold
> > migrate instance to the host. live migration betwen hosts with local
> > attach enabeld and disabled is not supported.
> >
> > if you want to disable it again in the futrue which you will have to do to
> > upgrade to xena you need to cold migrate all instances
> > again.
> >
> > so if you are deploying your own version fo cpeh and can move to a newer
> > version which has the librados perforamce enhacment feature
> > that is operationlly less painful then using these workaround.
> >
> > the only reason we developed this workaroudn to use krbd in nova was
> > because our hands were tieed downstream since we could not ship a new
> > version of
> > ceph but needed to support release with this perfromance limiation for
> > multiple years. so unless your in a simialr situration
> > upgradeing ceph and ensuring you use the new versions of the ceph libs
> > with qemu and a new enough qemu to leverave the performance enhancments is
> > the
> > best option.
> >
> > so with those disclaimer you may want to consider evaluating those
> > workaround options but keep in mind the limiatation and the fact that you
> > cannot
> > live migrate until that bug is fixt before considering using it in
> > production.
> >
> > >
> > > - Do both RBD images (Cinder and yours) have the same Ceph flags?
> > >
> > > - Did you try connecting to the Compute Host the same RBD image created
> > >   by Cinder instead of creating a new one?
> > >
> > > Cheers,
> > > Gorka.
> > >
> > > > =>  So i think the problem is in the hypervisor
> > > > But when I check performance on VM using cinder Volume SSD, the result
> > > > equals performance when test RBD (SSD) mounted on a Compute host.
> > > > 2.  I already have to configure disk_cachemodes="network=writeback"(and
> > > > enable rbd cache client) or test with disk_cachemodes="none" but
> > nothing
> > > > different.
> > > > 3.  Push iperf3 from compute host to random ceph host still has 20Gb
> > > > traffic.
> > > > 4.  Compute Host and CEPH host connected to the same switch (layer2).
> > > > Where else can I look for issues?
> > > > Please help me in this case.
> > > > Thank you.
> > >
> > >
> >
> >




More information about the openstack-discuss mailing list