[Openstack] [SWIFT] Bad replication performance after adding new drives

Klaus Schürmann klaus.schuermann at mediabeam.com
Fri Feb 13 09:26:11 UTC 2015


Hello Robert,

thank you for your explanations. So I will add some memory to the servers. 

Here is a extract from the xfs_stats (very strange is that the ig_attempts are 0):

XFS Statistics  [Fri Feb 13 10:17:43 2015]
  Extent Allocation                      Tail-Pushing Stats
    xs_allocx............     721554441    xs_sleep_logspace.....             0
    xs_allocb............    2978817175    xs_try_logspace.......    2648259876
    xs_freex.............    1037137774    xs_push_ail...........    1274008366
    xs_freeb.............    3591851025    xs_push_ail_success...    2814237952
  Allocation Btree                         xs_push_ail_pushbuf...             0
    xs_abt_lookup........             0    xs_push_ail_pinned....      84764808
    xs_abt_compare.......             0    xs_push_ail_locked....      96990387
    xs_abt_insrec........             0    xs_push_ail_flushing..     268869791
    xs_abt_delrec........             0    xs_push_ail_restarts..             0
  Block Mapping                            xs_push_ail_flush.....       5697550
    xs_blk_mapr..........    1806678603  IoMap Write Convert
    xs_blk_mapw..........     741412717    xs_xstrat_bytes.......            b2
    xs_blk_unmap.........    1108764112    xs_xstrat_quick.......     346470374
    xs_add_exlist........     780977993    xs_xstrat_split.......             0
    xs_del_exlist........    1109262944  Read/Write Stats
    xs_look_exlist.......    3854480137    xs_write_calls........    1812896411
    xs_cmp_exlist........             0    xs_write_bytes........    1759137862
  Block Map Btree                          xs_read_calls.........    2831914870
    xs_bmbt_lookup.......             0    xs_read_bytes.........    1905941951
    xs_bmbt_compare......             0  Attribute Operations
    xs_bmbt_insrec.......             0    xs_attr_get...........    2097582332
    xs_bmbt_delrec.......             0    xs_attr_set...........     137363045
  Directory Operations                     xs_attr_remove........             0
    xs_dir_lookup........    1383727958    xs_attr_list..........     425762674
    xs_dir_create........     626339630  Inode Clustering
    xs_dir_remove........     665888836    xs_iflush_count.......    1079913439
    xs_dir_getdents......    2051336612    xs_icluster_flushcnt..     131626781
  Transactions                             xs_icluster_flushinode     261024082
    xs_trans_sync........       2268922  Vnode Statistics
    xs_trans_async.......    2384968008    vn_active.............    1959222046
    xs_trans_empty.......     126943891    vn_alloc..............             0
  Inode Operations                         vn_get................             0
    xs_ig_attempts.......             0    vn_hold...............             0
    xs_ig_found..........    2428149123    vn_rele...............    2335745250
    xs_ig_frecycle.......         10570    vn_reclaim............    2335745248
    xs_ig_missed.........    2082679778    vn_remove.............    2335745250
    xs_ig_dup............             0  Buf Statistics
    xs_ig_reclaims.......    2078368139    pb_get................    4270519930
    xs_ig_attrchg........      83099031    pb_create.............    3001249573
  Log Operations                           pb_get_locked.........    1359003256
    xs_log_writes........     749988390    pb_get_locked_waited..     311370280
    xs_log_blocks........     696639155    pb_busy_locked........     179372557
    xs_log_noiclogs......        231837    pb_miss_locked........    2911539264
    xs_log_force.........     722227877    pb_page_retries.......             0
    xs_log_force_sleep...     673328565    pb_page_found.........     389495658
                                           pb_get_read...........    2658931555

Best regards
Klaus

-----Ursprüngliche Nachricht-----
Von: Robert van Leeuwen [mailto:Robert.vanLeeuwen at spilgames.com] 
Gesendet: Donnerstag, 12. Februar 2015 08:47
An: Klaus Schürmann; 'openstack at lists.openstack.org'
Betreff: RE: [Openstack] [SWIFT] Bad replication performance after adding new drives

>My xfs inode size is 265 Bytes.
> When I increase the memory and change vm.vfs_cache_pressure to 1, is it possible to store the inode tree in the > memory?
> Maybe the random disk seeks are the problem.
>
> Here is a iostat snapshot:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>sde               0.00     0.00  143.80    2.60  1028.80   123.90    15.75     1.15    7.84    7.97    0.62   6.49  95.04
>sdh               0.00     0.00   94.80    3.80   681.60   156.70    17.00     1.27   12.51   13.00    0.21   9.89  97.52


Well, at least 2 disks have pretty much 100% util my guess would be the replicator / rsync and maybe auditor is currently busy with those disks.
Setting the cache_pressure to 1 does not guarantee anything yet, just that it is less likely to evict data.
The best check to see if it all fits would be setting the cache-pressure to 0 and do a find on all the disks.
If the OOM killer does not appear your fine, if does though you will have a flaky server ;)

However we can also do an estimate without potentially crashing a server:
The amount of memory you need to keep it al in memory would be number of used inodes on the disks * inode size.
df -i should gives you the number of inodes in use.
I'm not sure if the xattr info now also fits inside those 256k inodes so it might be a bit more.
(with older kernels you needed 1K inodes so I wonder where that data goes now).

My guess based on the info you gave:
360 million files * replica 3 / 7 Nodes = 154 Milion inodes per server
256 Bytes * 154 Million = 40GB

Note that this is just for the files and does not include the inodes for the directories (which will also be a ridiculous amount) So it probably does not fit in 32GB maybe just in 64GB.

So you should probably see quite a lot of cache misses currently in the xfs stats (xfs xs_dir_lookup, xs_ig_missed)

Cheers,
Robert van Leeuwen



-----Ursprüngliche Nachricht-----
Von: Robert van Leeuwen [mailto:Robert.vanLeeuwen at spilgames.com]
Gesendet: Dienstag, 10. Februar 2015 12:23
An: Klaus Schürmann; 'openstack at lists.openstack.org'
Betreff: RE: [Openstack] [SWIFT] Bad replication performance after adding new drives

> I set the vfs_cache_pressure to 10 and moved container- and account-server to SSD harddrives.
> The normal performance for object writes and reads are quite ok.

> But why takes moving some partions to only two new harddisks so much time?
> Will it be faster if I add more memory?

My guess: Probably the source disks/servers are slow.
When the inode tree is not in memory it will do a lot of random reads to the disks (for both the inode tree and the actual file).
An rsync of any directory will become slow on the source side ( iirc you can see this in the replicator log) You should be able to see in e.g. atop if the source or destination disks are the limiting factor.

If the source is the issue it might help to increase the maximum number of simultaneous rsync processes so you have more parallel slow processes ;) Note that this can have impact on the general speed of the Swift cluster.
More memory will probably help a bit.

Cheers,
Robert




More information about the Openstack mailing list