[Openstack] [Swift] tmp directory causing Swift slowdown

Christian Schwede christian.schwede at enovance.com
Fri May 1 19:37:23 UTC 2015


On 01.05.15 20:33, Samuel Merritt wrote:
> On 5/1/15 7:55 AM, Uwe Sauter wrote:
>>
>>
>> Am 01.05.2015 um 02:21 schrieb Samuel Merritt:
>>>
>>> It seems like 1430268763.41931.data would be in the same allocation
>>> group as
>>> objects/757/a94/bd77129a1cae9e32381776e322efca94, and
>>> bd77129a1cae9e32381776e322efca94 would be in the same allocation
>>> group as objects/757/a94, and so on. Thus, everything would be in the
>>> same allocation group as the root directory.
>>>
>>> This can't be the case, or else there'd be no point to allocation
>>> groups. What am I missing here?
>>
>>
>> Hi,
>>
>> I think what you're missing is, that inodes stay in the allocation
>> group where they first were created. So moving a file
>> around in the filesystem changes the path but not the allocation
>> group. So first creating a temporary file and then
>> moving it into the hash folder leaves the file associated with the
>> temp folder's allocation group, thus the allocation
>> group grows bigger and bigger and searching the allocation group takes
>> more and more time.
> 
> That doesn't really answer the question, though. We have this message
> <http://www.spinics.net/lists/xfs/msg32868.html> which says that "...the
> locality of a new inode is determined by the
> parent inode, and so if all new inodes are created in the same
> directory, then they are all created in the same AG."
> 
> Let's say we start out with a freshly-formatted disk, so there's only
> one inode, and it's for the root directory.
> 
> Then, Swift goes and starts making its directory structure on disk, and
> calls mkdir('objects'). Since a new inode is created in the same AG as
> its parent, the inode for '/objects' is in the same AG as the inode for
> '/'.
> 
> Swift makes another dir: mkdir('objects/757')
> 
> The inode for '/objects/757' is in the same AG as its parent '/objects',
> which is the same as the AG for '/'.
> 
> Keep going a while, and you get
> 
> /
> /objects
> /objects/757
> /objects/757/a94
> /objects/757/a94/bd77129a1cae9e32381776e322efca94
> /objects/757/a94/bd77129a1cae9e32381776e322efca94/1430268763.41931.data
> /tmp
> 
> and they're all in the same AG.
> 
> Now, the XFS developers are not stupid, so what I typed up there can't
> possibly be true, or else every inode on a filesystem would be in the
> same AG.
> 
> So, my question is this: what, of the things I typed above, is false?
> Equivalently, how is an inode created in a *different* AG than its parent?

Hmm, reading the docs at

http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/AG_Free_Space_Management.html

I would assume a different AG is selected if there is one AG with more
free space.

But, and that might be one of the problems here: it seems there is a
default of only 4 allocation groups, at least that's what I see on
various disks executing a xfs_info.

In fact after looking into the sources of mkfs.xfs I found this default
for disks with sizes up to 4TB:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfsprogs.git;a=blob;f=mkfs/xfs_mkfs.c;h=5084d755;hb=HEAD#l688

Might be a good idea to do some benchmarking with different AG numbers?

-- Christian




More information about the Openstack mailing list