Poor I/O performance on OpenStack block device (OpenStack Centos8:Ussuri)
Vinh Nguyen Duc
vinhducnguyen1708 at gmail.com
Thu Jul 7 12:40:19 UTC 2022
Thank for your email
We are not using encryption volume.
If this is a bug of librados, i do not see any effect of throughput when VM
using volume SSD.
And the performance of ceph HDD mounted directly from compute still good.
We already disable debug in ceph.conf
On Thu, 7 Jul 2022 at 19:26 Sean Mooney <smooney at redhat.com> wrote:
> On Thu, 2022-07-07 at 12:06 +0200, Gorka Eguileor wrote:
> > On 07/07, Vinh Nguyen Duc wrote:
> > > I have a problem with I/O performance on Openstack block device HDD.
> > >
> > > *Environment:**Openstack version: Ussuri*
> > > - OS: CentOS8
> > > - Kernel: 4.18.0-240.15.1.el8_3.x86_64
> > > - KVM: qemu-kvm-5.1.0-20.el8
> > > *CEPH version: Octopus * *15.2.8-0.el8.x84_64*
> > > - OS: CentOS8
> > > - Kernel: 4.18.0-240.15.1.el8_3.x86_64
> > > In CEPH Cluster we have 2 class:
> > > - Bluestore
> > > - HDD (only for cinder volume)
> > > - SSD (images, cinder volume)
> > > *Hardware:*
> > > - Ceph-client: 2x10Gbps (bond) MTU 9000
> > > - Ceph-replicate: 2x10Gbps (bond) MTU 9000
> > > *VM:*
> > > - Swapoff
> > > - non LVM
> > >
> > > *Issue*When create VM on Openstack using cinder volume HDD, have really
> > > poor performance: 60-85 MB/s writes. And when tests with ioping have
> high
> > > latency.
> > > *Diagnostic*
> > > 1. I have checked the performance between Compute Host (Openstack) and
> > > CEPH, and created an RBD (HDD class) mounted on Compute Host. And the
> > > performance is 300-400 MB/s.
> >
> > Hi,
> >
> > I probably won't be able to help you on the hypervisor side, but I have
> > a couple of questions that may help narrow down the issue:
> >
> > - Are Cinder volumes using encryption?
> if you are not using encyrption you might be encountering librados issue
> tracked downstream by https://bugzilla.redhat.com/show_bug.cgi?id=1897572
> this is unfixable without moving to updating to a new version fo the cpeh
> client
> libs.
> >
> > - How did you connect the volume to the Compute Host, using krbd or
> > rbd-nbd?
> in ussuri we still technially have the workaround options to use krbd but
> they are
> deprecated and removed in xena.
>
> https://github.com/openstack/nova/blob/stable/ussuri/nova/conf/workarounds.py#L277-L329=
> in generaly using these options might invlaidate any support agreement you
> may have with a vendeor.
>
> we are aware of at least once edgecase currently where enableing this with
> encyrpted volume breaks live
> migration potentally causeing dataloss.
> https://bugs.launchpad.net/nova/+bug/1939545
> there is a backport inflight for the fix to train
> https://review.opendev.org/q/topic:bug%252F1939545
> but its only been backported to wallaby so far so it is not safe to enable
> those options and use live migration
> today.
>
> you should also be aware that to enabel this optionon a host you need to
> drain the host first then enable the option adn cold
> migrate instance to the host. live migration betwen hosts with local
> attach enabeld and disabled is not supported.
>
> if you want to disable it again in the futrue which you will have to do to
> upgrade to xena you need to cold migrate all instances
> again.
>
> so if you are deploying your own version fo cpeh and can move to a newer
> version which has the librados perforamce enhacment feature
> that is operationlly less painful then using these workaround.
>
> the only reason we developed this workaroudn to use krbd in nova was
> because our hands were tieed downstream since we could not ship a new
> version of
> ceph but needed to support release with this perfromance limiation for
> multiple years. so unless your in a simialr situration
> upgradeing ceph and ensuring you use the new versions of the ceph libs
> with qemu and a new enough qemu to leverave the performance enhancments is
> the
> best option.
>
> so with those disclaimer you may want to consider evaluating those
> workaround options but keep in mind the limiatation and the fact that you
> cannot
> live migrate until that bug is fixt before considering using it in
> production.
>
> >
> > - Do both RBD images (Cinder and yours) have the same Ceph flags?
> >
> > - Did you try connecting to the Compute Host the same RBD image created
> > by Cinder instead of creating a new one?
> >
> > Cheers,
> > Gorka.
> >
> > > => So i think the problem is in the hypervisor
> > > But when I check performance on VM using cinder Volume SSD, the result
> > > equals performance when test RBD (SSD) mounted on a Compute host.
> > > 2. I already have to configure disk_cachemodes="network=writeback"(and
> > > enable rbd cache client) or test with disk_cachemodes="none" but
> nothing
> > > different.
> > > 3. Push iperf3 from compute host to random ceph host still has 20Gb
> > > traffic.
> > > 4. Compute Host and CEPH host connected to the same switch (layer2).
> > > Where else can I look for issues?
> > > Please help me in this case.
> > > Thank you.
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20220707/6577e1e3/attachment.htm>
More information about the openstack-discuss
mailing list