[openstack-dev] [nova][cinder] Limits on volume read throughput?

Preston L. Bannister preston at bannister.us
Thu Mar 3 19:13:01 UTC 2016

Note that my end goal is to benchmark an application that runs in an
instance that does primarily large sequential full-volume-reads.

On this path I ran into unexpectedly poor performance within the instance.
If this is a common characteristic of OpenStack, then this becomes a
question of concern to OpenStack developers.

Until recently, ~450MB/s would be (and still is for many cases)
*outstanding* throughput. Most similar questions on the web are happy with
saturating a couple of gigabit links, or a few spinning disks. So that
few(?) folk (to now) have asked questions at this level of performance ...
is not a surprise.

But with flash displacing spinning disks, much higher throughput is
possible. If there is an unnecessary bottleneck, this might be a good time
to call attention.

>From general notions to current specifics... :)

On Wed, Mar 2, 2016 at 10:10 PM, Philipp Marek <philipp.marek at linbit.com>

> > The benchmark scripts are in:
> >   https://github.com/pbannister/openstack-bootstrap

> in case that might help, here are a few notes and hints about doing
> benchmarks for the DRDB block device driver:
>     http://blogs.linbit.com/p/897/benchmarking-drbd/
> Perhaps there's something interesting for you.

Found this earlier. :)

> Found that if I repeatedly scanned the same 8GB volume from the physical
> > host (with 1/4TB of memory), the entire volume was cached in (host)
> memory
> > (very fast scan times).

> If the iSCSI target (or QEMU, for direct access) is set up to use buffer
> cache, yes.
> Whether you really want that is up to discussion - it might be much more
> beneficial to move that RAM from the Hypervisor to the VM, which should
> then be able to do more efficient caching of the filesystem contents that
> it should operate on.

You are right, but my aim was a bit different. Doing a bit of

In essence, this test was to see if reducing the host-side volume-read time
to (practically) zero would have *any* impact on performance. Given the
*huge* introduced latency (somewhere), I did not expect a notable
difference - and that it what the measure shows. This further supports the
theory that host-side Linux is *not* the issue.

> Scanning the same volume from within the instance still gets the same
> > ~450MB/s that I saw before.

> Hmmm, with iSCSI inbetween that could be the TCP memcpy limitation.

Measuring iSCSI in isolation is next on my list. Both on the physical host,
and in the instance. (Now to find that link to the iSCSI test, again...)

> > The "iostat" numbers from the instance show ~44 %iowait, and ~50 %idle.
> > (Which to my reading might explain the ~50% loss of performance.) Why so
> > much idle/latency?
> >
> > The in-instance "dd" CPU use is ~12%. (Not very interesting.)

> Because your "dd" testcase will be single-threaded, io-depth 1.
> And that means synchronous access, each IO has to wait for the preceeding
> one to finish...

Given the Linux kernel read-ahead parameter has a noticeable impact on
performance, I believe that "dd" does not need wait (much?) for I/O. Note
also the large difference between host and instance with "dd".

> > Not sure from where the (apparent) latency comes. The host iSCSI target?
> > The QEMU iSCSI initiator? Onwards...

> Thread scheduling, inter-CPU cache trashing (if the iSCSI target is on
> a different physical CPU package/socket than the VM), ...
> Benchmarking is a dark art.

This physical host has an absurd number of CPUs (at 40), so what you
mention is possible. At these high rates, if only losing 10-20% of the
throughput, I might consider such causes. But losing 60% ... my guess ...
the cause is much less esoteric.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160303/db6ca903/attachment.html>

More information about the OpenStack-dev mailing list