[Openstack] [SWIFT] Bad replication performance after adding new drives
Klaus Schürmann
klaus.schuermann at mediabeam.com
Fri Feb 13 09:26:11 UTC 2015
Hello Robert,
thank you for your explanations. So I will add some memory to the servers.
Here is a extract from the xfs_stats (very strange is that the ig_attempts are 0):
XFS Statistics [Fri Feb 13 10:17:43 2015]
Extent Allocation Tail-Pushing Stats
xs_allocx............ 721554441 xs_sleep_logspace..... 0
xs_allocb............ 2978817175 xs_try_logspace....... 2648259876
xs_freex............. 1037137774 xs_push_ail........... 1274008366
xs_freeb............. 3591851025 xs_push_ail_success... 2814237952
Allocation Btree xs_push_ail_pushbuf... 0
xs_abt_lookup........ 0 xs_push_ail_pinned.... 84764808
xs_abt_compare....... 0 xs_push_ail_locked.... 96990387
xs_abt_insrec........ 0 xs_push_ail_flushing.. 268869791
xs_abt_delrec........ 0 xs_push_ail_restarts.. 0
Block Mapping xs_push_ail_flush..... 5697550
xs_blk_mapr.......... 1806678603 IoMap Write Convert
xs_blk_mapw.......... 741412717 xs_xstrat_bytes....... b2
xs_blk_unmap......... 1108764112 xs_xstrat_quick....... 346470374
xs_add_exlist........ 780977993 xs_xstrat_split....... 0
xs_del_exlist........ 1109262944 Read/Write Stats
xs_look_exlist....... 3854480137 xs_write_calls........ 1812896411
xs_cmp_exlist........ 0 xs_write_bytes........ 1759137862
Block Map Btree xs_read_calls......... 2831914870
xs_bmbt_lookup....... 0 xs_read_bytes......... 1905941951
xs_bmbt_compare...... 0 Attribute Operations
xs_bmbt_insrec....... 0 xs_attr_get........... 2097582332
xs_bmbt_delrec....... 0 xs_attr_set........... 137363045
Directory Operations xs_attr_remove........ 0
xs_dir_lookup........ 1383727958 xs_attr_list.......... 425762674
xs_dir_create........ 626339630 Inode Clustering
xs_dir_remove........ 665888836 xs_iflush_count....... 1079913439
xs_dir_getdents...... 2051336612 xs_icluster_flushcnt.. 131626781
Transactions xs_icluster_flushinode 261024082
xs_trans_sync........ 2268922 Vnode Statistics
xs_trans_async....... 2384968008 vn_active............. 1959222046
xs_trans_empty....... 126943891 vn_alloc.............. 0
Inode Operations vn_get................ 0
xs_ig_attempts....... 0 vn_hold............... 0
xs_ig_found.......... 2428149123 vn_rele............... 2335745250
xs_ig_frecycle....... 10570 vn_reclaim............ 2335745248
xs_ig_missed......... 2082679778 vn_remove............. 2335745250
xs_ig_dup............ 0 Buf Statistics
xs_ig_reclaims....... 2078368139 pb_get................ 4270519930
xs_ig_attrchg........ 83099031 pb_create............. 3001249573
Log Operations pb_get_locked......... 1359003256
xs_log_writes........ 749988390 pb_get_locked_waited.. 311370280
xs_log_blocks........ 696639155 pb_busy_locked........ 179372557
xs_log_noiclogs...... 231837 pb_miss_locked........ 2911539264
xs_log_force......... 722227877 pb_page_retries....... 0
xs_log_force_sleep... 673328565 pb_page_found......... 389495658
pb_get_read........... 2658931555
Best regards
Klaus
-----Ursprüngliche Nachricht-----
Von: Robert van Leeuwen [mailto:Robert.vanLeeuwen at spilgames.com]
Gesendet: Donnerstag, 12. Februar 2015 08:47
An: Klaus Schürmann; 'openstack at lists.openstack.org'
Betreff: RE: [Openstack] [SWIFT] Bad replication performance after adding new drives
>My xfs inode size is 265 Bytes.
> When I increase the memory and change vm.vfs_cache_pressure to 1, is it possible to store the inode tree in the > memory?
> Maybe the random disk seeks are the problem.
>
> Here is a iostat snapshot:
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
>sde 0.00 0.00 143.80 2.60 1028.80 123.90 15.75 1.15 7.84 7.97 0.62 6.49 95.04
>sdh 0.00 0.00 94.80 3.80 681.60 156.70 17.00 1.27 12.51 13.00 0.21 9.89 97.52
Well, at least 2 disks have pretty much 100% util my guess would be the replicator / rsync and maybe auditor is currently busy with those disks.
Setting the cache_pressure to 1 does not guarantee anything yet, just that it is less likely to evict data.
The best check to see if it all fits would be setting the cache-pressure to 0 and do a find on all the disks.
If the OOM killer does not appear your fine, if does though you will have a flaky server ;)
However we can also do an estimate without potentially crashing a server:
The amount of memory you need to keep it al in memory would be number of used inodes on the disks * inode size.
df -i should gives you the number of inodes in use.
I'm not sure if the xattr info now also fits inside those 256k inodes so it might be a bit more.
(with older kernels you needed 1K inodes so I wonder where that data goes now).
My guess based on the info you gave:
360 million files * replica 3 / 7 Nodes = 154 Milion inodes per server
256 Bytes * 154 Million = 40GB
Note that this is just for the files and does not include the inodes for the directories (which will also be a ridiculous amount) So it probably does not fit in 32GB maybe just in 64GB.
So you should probably see quite a lot of cache misses currently in the xfs stats (xfs xs_dir_lookup, xs_ig_missed)
Cheers,
Robert van Leeuwen
-----Ursprüngliche Nachricht-----
Von: Robert van Leeuwen [mailto:Robert.vanLeeuwen at spilgames.com]
Gesendet: Dienstag, 10. Februar 2015 12:23
An: Klaus Schürmann; 'openstack at lists.openstack.org'
Betreff: RE: [Openstack] [SWIFT] Bad replication performance after adding new drives
> I set the vfs_cache_pressure to 10 and moved container- and account-server to SSD harddrives.
> The normal performance for object writes and reads are quite ok.
> But why takes moving some partions to only two new harddisks so much time?
> Will it be faster if I add more memory?
My guess: Probably the source disks/servers are slow.
When the inode tree is not in memory it will do a lot of random reads to the disks (for both the inode tree and the actual file).
An rsync of any directory will become slow on the source side ( iirc you can see this in the replicator log) You should be able to see in e.g. atop if the source or destination disks are the limiting factor.
If the source is the issue it might help to increase the maximum number of simultaneous rsync processes so you have more parallel slow processes ;) Note that this can have impact on the general speed of the Swift cluster.
More memory will probably help a bit.
Cheers,
Robert
More information about the Openstack
mailing list