[Openstack] [SWIFT] Bad replication performance after adding new drives

Robert van Leeuwen Robert.vanLeeuwen at spilgames.com
Thu Feb 12 07:47:29 UTC 2015

>My xfs inode size is 265 Bytes.
> When I increase the memory and change vm.vfs_cache_pressure to 1, is it possible to store the inode tree in the > memory?
> Maybe the random disk seeks are the problem.
> Here is a iostat snapshot:
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sde               0.00     0.00  143.80    2.60  1028.80   123.90    15.75     1.15    7.84    7.97    0.62   6.49  95.04
>sdh               0.00     0.00   94.80    3.80   681.60   156.70    17.00     1.27   12.51   13.00    0.21   9.89  97.52

Well, at least 2 disks have pretty much 100% util my guess would be the replicator / rsync and maybe auditor is currently busy with those disks.
Setting the cache_pressure to 1 does not guarantee anything yet, just that it is less likely to evict data.
The best check to see if it all fits would be setting the cache-pressure to 0 and do a find on all the disks.
If the OOM killer does not appear your fine, if does though you will have a flaky server ;)

However we can also do an estimate without potentially crashing a server:
The amount of memory you need to keep it al in memory would be number of used inodes on the disks * inode size.
df -i should gives you the number of inodes in use.
I'm not sure if the xattr info now also fits inside those 256k inodes so it might be a bit more.
(with older kernels you needed 1K inodes so I wonder where that data goes now).

My guess based on the info you gave:
360 million files * replica 3 / 7 Nodes = 154 Milion inodes per server
256 Bytes * 154 Million = 40GB

Note that this is just for the files and does not include the inodes for the directories (which will also be a ridiculous amount)
So it probably does not fit in 32GB maybe just in 64GB.

So you should probably see quite a lot of cache misses currently in the xfs stats (xfs xs_dir_lookup, xs_ig_missed)

Robert van Leeuwen

-----Ursprüngliche Nachricht-----
Von: Robert van Leeuwen [mailto:Robert.vanLeeuwen at spilgames.com]
Gesendet: Dienstag, 10. Februar 2015 12:23
An: Klaus Schürmann; 'openstack at lists.openstack.org'
Betreff: RE: [Openstack] [SWIFT] Bad replication performance after adding new drives

> I set the vfs_cache_pressure to 10 and moved container- and account-server to SSD harddrives.
> The normal performance for object writes and reads are quite ok.

> But why takes moving some partions to only two new harddisks so much time?
> Will it be faster if I add more memory?

My guess: Probably the source disks/servers are slow.
When the inode tree is not in memory it will do a lot of random reads to the disks (for both the inode tree and the actual file).
An rsync of any directory will become slow on the source side ( iirc you can see this in the replicator log) You should be able to see in e.g. atop if the source or destination disks are the limiting factor.

If the source is the issue it might help to increase the maximum number of simultaneous rsync processes so you have more parallel slow processes ;) Note that this can have impact on the general speed of the Swift cluster.
More memory will probably help a bit.


More information about the Openstack mailing list