[Openstack] [Swift] tmp directory causing Swift slowdown

Samuel Merritt sam at swiftstack.com
Fri May 1 21:06:56 UTC 2015


On 5/1/15 12:37 PM, Christian Schwede wrote:
> On 01.05.15 20:33, Samuel Merritt wrote:
>> On 5/1/15 7:55 AM, Uwe Sauter wrote:
>>>
>>>
>>> Am 01.05.2015 um 02:21 schrieb Samuel Merritt:
>>>>
>>>> It seems like 1430268763.41931.data would be in the same allocation
>>>> group as
>>>> objects/757/a94/bd77129a1cae9e32381776e322efca94, and
>>>> bd77129a1cae9e32381776e322efca94 would be in the same allocation
>>>> group as objects/757/a94, and so on. Thus, everything would be in the
>>>> same allocation group as the root directory.
>>>>
>>>> This can't be the case, or else there'd be no point to allocation
>>>> groups. What am I missing here?
>>>
>>>
>>> Hi,
>>>
>>> I think what you're missing is, that inodes stay in the allocation
>>> group where they first were created. So moving a file
>>> around in the filesystem changes the path but not the allocation
>>> group. So first creating a temporary file and then
>>> moving it into the hash folder leaves the file associated with the
>>> temp folder's allocation group, thus the allocation
>>> group grows bigger and bigger and searching the allocation group takes
>>> more and more time.
>>
>> That doesn't really answer the question, though. We have this message
>> <http://www.spinics.net/lists/xfs/msg32868.html> which says that "...the
>> locality of a new inode is determined by the
>> parent inode, and so if all new inodes are created in the same
>> directory, then they are all created in the same AG."
>>
>> Let's say we start out with a freshly-formatted disk, so there's only
>> one inode, and it's for the root directory.
>>
>> Then, Swift goes and starts making its directory structure on disk, and
>> calls mkdir('objects'). Since a new inode is created in the same AG as
>> its parent, the inode for '/objects' is in the same AG as the inode for
>> '/'.
>>
>> Swift makes another dir: mkdir('objects/757')
>>
>> The inode for '/objects/757' is in the same AG as its parent '/objects',
>> which is the same as the AG for '/'.
>>
>> Keep going a while, and you get
>>
>> /
>> /objects
>> /objects/757
>> /objects/757/a94
>> /objects/757/a94/bd77129a1cae9e32381776e322efca94
>> /objects/757/a94/bd77129a1cae9e32381776e322efca94/1430268763.41931.data
>> /tmp
>>
>> and they're all in the same AG.
>>
>> Now, the XFS developers are not stupid, so what I typed up there can't
>> possibly be true, or else every inode on a filesystem would be in the
>> same AG.
>>
>> So, my question is this: what, of the things I typed above, is false?
>> Equivalently, how is an inode created in a *different* AG than its parent?
>
> Hmm, reading the docs at
>
> http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/AG_Free_Space_Management.html
>
> I would assume a different AG is selected if there is one AG with more
> free space.

After far too much time spent searching, I found some docs explaining 
things. On XFS, a *file* is created in the same AG as its parent 
directory [1], but a *directory* is not [2]. Rather, directories are 
splayed across all AGs.

Thus, in the example above, the .data file and its parent directory 
share an AG, but the other directories could each be in any AG.

[1] 
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/ch06s03.html

[2] 
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/ch06s02.html

> But, and that might be one of the problems here: it seems there is a
> default of only 4 allocation groups, at least that's what I see on
> various disks executing a xfs_info.
>
> In fact after looking into the sources of mkfs.xfs I found this default
> for disks with sizes up to 4TB:
>
> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfsprogs.git;a=blob;f=mkfs/xfs_mkfs.c;h=5084d755;hb=HEAD#l688
>
> Might be a good idea to do some benchmarking with different AG numbers?

Could be useful, but we should first get Swift to not dump everything in 
the same AG. Otherwise, the benchmarks will be pretty predictable. ;)





More information about the Openstack mailing list