[openstack-dev] [nova][cinder] Limits on volume read throughput?

Preston L. Bannister preston at bannister.us
Thu Mar 3 01:40:01 UTC 2016

First, my degree from school is in Physics. So I know something about
designing experiments. :)

The benchmark scripts runs "dd" 218 times, against different volumes (of
differing sizes), with differing "bs". Measures are collected both from the
physical host, and from the within the instance. Linux is told to drop
caches before the start.

The benchmark scripts are in:


(Very much a work in progress! Not complete or properly documented.)

Second, went through the exercise of collecting hints from the web as to
parameters for tuning iSCSI performance. (I did not expect changing Linux
TCP parameters to change the result for iSCSI over loopback, but measured
to be certain.) Followed all the hints, with no change in performance (as

Found that if I repeatedly scanned the same 8GB volume from the physical
host (with 1/4TB of memory), the entire volume was cached in (host) memory
(very fast scan times).

Scanning the same volume from within the instance still gets the same
~450MB/s that I saw before. The difference is that "iostat" in the host is
~93% idle. In the host, *iscsi_ttx* is using ~58% of a CPU (sound high?),
and *qemu-kvm* is using ~30% of a CPU. (The physical host is a fairly new
box - with 40(!) CPUs.)

The "iostat" numbers from the instance show ~44 %iowait, and ~50 %idle.
(Which to my reading might explain the ~50% loss of performance.) Why so
much idle/latency?

The in-instance "dd" CPU use is ~12%. (Not very interesting.)

Not sure from where the (apparent) latency comes. The host iSCSI target?
The QEMU iSCSI initiator? Onwards...

On Tue, Mar 1, 2016 at 5:13 PM, Rick Jones <rick.jones2 at hpe.com> wrote:

> On 03/01/2016 04:29 PM, Preston L. Bannister wrote:
> Running "dd" in the physical host against the Cinder-allocated volumes
>> nets ~1.2GB/s (roughly in line with expectations for the striped flash
>> volume).
>> Running "dd" in an instance against the same volume (now attached to the
>> instance) got ~300MB/s, which was pathetic. (I was expecting 80-90% of
>> the raw host volume numbers, or better.) Upping read-ahead in the
>> instance via "hdparm" boosted throughput to ~450MB/s. Much better, but
>> still sad.
>> In the second measure the volume data passes through iSCSI and then the
>> QEMU hypervisor. I expected to lose some performance, but not more than
>> half!
>> Note that as this is an all-in-one OpenStack node, iSCSI is strictly
>> local and not crossing a network. (I did not want network latency or
>> throughput to be a concern with this first measure.)
> Well, not crossing a physical network :)  You will be however likely
> crossing the loopback network on the node.

Well ... yes. I suspect the latency and bandwidth numbers for loopback are
rather better. :)

For the purposes of this experiment, I wanted to eliminate the physical
network limits as a consideration.

What sort of per-CPU utilizations do you see when running the test to the
> instance?  Also, out of curiosity, what block size are you using in dd?  I
> wonder how well that "maps" to what iSCSI will be doing.

First, this measure was collected via a script that tried a moderately
exhaustive number of variations. Yes, I had the same question. Kernel host
read-ahead is 6MB (automatically set). Did not see notable gain past
"bs=2M". (Was expecting a bigger gain for larger reads, but not what
measures showed.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160302/dfbb079e/attachment.html>

More information about the OpenStack-dev mailing list