[Openstack] [SWIFT] PUTs and GETs getting slower

Klaus Schürmann klaus.schuermann at mediabeam.com
Thu Aug 29 07:09:17 UTC 2013

Hello Robert,

very interesting experiences and thank you very much sharing them to us.

Based on your experiences I made some tests with a 256 Byte inode size. I use Ubuntu with kernel 3.5.0 and its working properly.
That should reduce the memory consumption and will perform better if I store much more objects.

Deleting objects is a very strange behavior. We delete about 700.000 objects in the night  and the load is rising 50 percent:


This is the network graph during deletion:




Von: Robert van Leeuwen [mailto:Robert.vanLeeuwen at spilgames.com] 
Gesendet: Mittwoch, 28. August 2013 09:34
An: openstack at lists.openstack.org
Betreff: Re: [Openstack] [SWIFT] PUTs and GETs getting slower

Just a follow up on this thread because I've took some time to write up our experiences:


Answering your question on initial sync times:
Yes, we also see long initials syncs. 
For us it will take a few days for a new node to be synced. 
Usually it goes pretty quickly at first (30 MB/second) and the performance gradually degrades when the disks start filling up and the machines are running low on memory.
We have about 6TB on a node to sync.

Robert van Leeuwen

From: Klaus Schürmann [klaus.schuermann at mediabeam.com]
Sent: Tuesday, August 20, 2013 9:04 AM
To: Maximiliano Venesio; Robert van Leeuwen
Cc: openstack at lists.openstack.org
Subject: AW: [Openstack] [SWIFT] PUTs and GETs getting slower
after adding additional disks and storing the account- and container-server on SSDs the performance is much better:
GETs      average               620 ms
PUTs     average               1900 ms
GETs      average               280 ms
PUTs     average               1100 ms
Only the rebalance process took days to sync all the data to the additional five disks (before each storage node had 3 disks). I used a concurrency of 4. One round to replicate all partitions took over 24 hours. After five days the replicate process takes only 300 seconds.
Each additional disk has now 300 GB data stored. Is such duration normal to sync the data?
Von: Maximiliano Venesio [mailto:maximiliano.venesio at mercadolibre.com] 
Gesendet: Donnerstag, 8. August 2013 17:26
An: Robert van Leeuwen
Cc: openstack at lists.openstack.org
Betreff: Re: [Openstack] [SWIFT] PUTs and GETs getting slower
Hi Robert, 
I was reading your post and is interesting because we have similar swift deployments and uses cases. 
We are storing millons of small images in our swift cluster, 32 Storage nodes w/12 - 2TB HDD + 2 SSD each one, and we are having an total average of 200k rpm in whole cluster.
In terms of % of util. of our disks,  we have an average of 50% of util in all our disks but we just are using a 15% of the total capacity of them.
When I look at used inodes on our object nodes with "df -i" we hit about 17 million inodes per disk.
So it seems a big number of inodes considering that we are using just a 15% of the total capacity. A different thing here is that we are using 512K of inode size and we have a big amount of memory . 
Also we always have one of our disks close to 100% of util, and this is caused by the object-auditor that scans all our disks continuously.  
So we was also thinking in the possibility to change the kind of disks that we are using, to use smaller and faster disks.
Will be really util to know what kind of disks are you using in your old and new storage nodes, and compare that with our case.


Maximiliano Venesio 
#melicloud CloudBuilders
Arias 3751, Piso 7 (C1430CRG) 
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 15-3770-1853
Tel : +54(11) 4640-8411
On Tue, Aug 6, 2013 at 11:54 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com> wrote:
Could you check your disk IO on the container /object nodes?

We have quite a lot of files in swift and for comparison purposes I played a bit with COSbench to see where we hit the limits.
We currently max out at about 200 - 300 put request/second and the bottleneck is the disk IO on the object nodes
Our account / container nodes are on SSD's and are not a limiting factor.

You can look for IO bottlenecks with e.g. "iostat -x 10" (this will refresh the view every 10 seconds.)
During the benchmark is see some of the disks are hitting 100% utilization.
That it is hitting the IO limits with just 200 puts a second has to do with the number of files on the disks.
When I look at used inodes on our object nodes with "df -i" we hit about 60 million inodes per disk.
(a significant part of that are actually directories I calculated about 30 million files based on the number of files in swift)
We use flashcache in front of those disks and it is still REALLY slow, just doing a "ls" can take up to 30 seconds.
Probably adding lots of memory should help caching the inodes in memory but that is quite challenging:
I am not sure how big a directory is in the xfs inode tree but just the files:
30 million x 1k inodes =  30GB
And that is just one disk :)

We still use the old recommended inode size of 1k and the default of 256 can be used now with recent kernels:

So sometime ago we decided to go for nodes with more,smaller & faster disks with more memory.
Those machines are not even close to their limits however we still have more "old" nodes
so performance is limited by those machines.
At this moment it is sufficient for our use case but I am pretty confident we would be able to
significantly improve performance by adding more of those machines and doing some re-balancing of the load.

Robert van Leeuwen
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack at lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

More information about the Openstack mailing list