[Openstack] [SWIFT] PUTs and GETs getting slower

Zhou, Yuan yuan.zhou at intel.com
Thu Aug 29 06:40:53 UTC 2013

Hi Klaus,

We've done some tests on replication in Swift before. Mostly the bottleneck is in the internal network bandwidth between your storage nodes.
Also please have a check on your rsyncd.conf to make sure the value of 'max connections' is not too small.
Hope this can help.

Regards, -yuanz

From: Klaus Schürmann [mailto:klaus.schuermann at mediabeam.com]
Sent: Tuesday, August 20, 2013 3:04 PM
To: Maximiliano Venesio; Robert van Leeuwen
Cc: openstack at lists.openstack.org
Subject: Re: [Openstack] [SWIFT] PUTs and GETs getting slower


after adding additional disks and storing the account- and container-server on SSDs the performance is much better:

GETs      average               620 ms
PUTs     average               1900 ms

GETs      average               280 ms
PUTs     average               1100 ms

Only the rebalance process took days to sync all the data to the additional five disks (before each storage node had 3 disks). I used a concurrency of 4. One round to replicate all partitions took over 24 hours. After five days the replicate process takes only 300 seconds.
Each additional disk has now 300 GB data stored. Is such duration normal to sync the data?


Von: Maximiliano Venesio [mailto:maximiliano.venesio at mercadolibre.com]
Gesendet: Donnerstag, 8. August 2013 17:26
An: Robert van Leeuwen
Cc: openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Betreff: Re: [Openstack] [SWIFT] PUTs and GETs getting slower

Hi Robert,

I was reading your post and is interesting because we have similar swift deployments and uses cases.
We are storing millons of small images in our swift cluster, 32 Storage nodes w/12 - 2TB HDD + 2 SSD each one, and we are having an total average of 200k rpm in whole cluster.
In terms of % of util. of our disks,  we have an average of 50% of util in all our disks but we just are using a 15% of the total capacity of them.
When I look at used inodes on our object nodes with "df -i" we hit about 17 million inodes per disk.

So it seems a big number of inodes considering that we are using just a 15% of the total capacity. A different thing here is that we are using 512K of inode size and we have a big amount of memory .
Also we always have one of our disks close to 100% of util, and this is caused by the object-auditor that scans all our disks continuously.

So we was also thinking in the possibility to change the kind of disks that we are using, to use smaller and faster disks.
Will be really util to know what kind of disks are you using in your old and new storage nodes, and compare that with our case.



Maximiliano Venesio
#melicloud CloudBuilders
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 15-3770-1853
Tel : +54(11) 4640-8411

On Tue, Aug 6, 2013 at 11:54 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com<mailto:Robert.vanLeeuwen at spilgames.com>> wrote:
Could you check your disk IO on the container /object nodes?

We have quite a lot of files in swift and for comparison purposes I played a bit with COSbench to see where we hit the limits.
We currently max out at about 200 - 300 put request/second and the bottleneck is the disk IO on the object nodes
Our account / container nodes are on SSD's and are not a limiting factor.

You can look for IO bottlenecks with e.g. "iostat -x 10" (this will refresh the view every 10 seconds.)
During the benchmark is see some of the disks are hitting 100% utilization.
That it is hitting the IO limits with just 200 puts a second has to do with the number of files on the disks.
When I look at used inodes on our object nodes with "df -i" we hit about 60 million inodes per disk.
(a significant part of that are actually directories I calculated about 30 million files based on the number of files in swift)
We use flashcache in front of those disks and it is still REALLY slow, just doing a "ls" can take up to 30 seconds.
Probably adding lots of memory should help caching the inodes in memory but that is quite challenging:
I am not sure how big a directory is in the xfs inode tree but just the files:
30 million x 1k inodes =  30GB
And that is just one disk :)

We still use the old recommended inode size of 1k and the default of 256 can be used now with recent kernels:

So sometime ago we decided to go for nodes with more,smaller & faster disks with more memory.
Those machines are not even close to their limits however we still have more "old" nodes
so performance is limited by those machines.
At this moment it is sufficient for our use case but I am pretty confident we would be able to
significantly improve performance by adding more of those machines and doing some re-balancing of the load.

Robert van Leeuwen
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130829/4ebe099d/attachment.html>

More information about the Openstack mailing list