[Openstack] [Swift] tmp directory causing Swift slowdown
Samuel Merritt
sam at swiftstack.com
Fri May 1 00:21:47 UTC 2015
On 4/29/15 4:08 PM, Shrinand Javadekar wrote:
> Hi,
>
> I have been investigating a pretty serious Swift performance problem
> for a while now. I have a single node Swift instance with 16 cores,
> 64GB memory and 8 MDs of 3TB each. I only write 256KB objects into
> this Swift instance with high concurrency; 256 parallel object PUTs.
> Also, I was sharding the objects equally across 32 containers.
>
> On a completely clean system, we were getting ~375 object puts per
> second. But this kept on reducing pretty quickly and by the time we
> had 600GB of data in Swift, the throughput was ~100 objects per
> second.
>
> We used sysdig to get a trace of what's happening in the system and
> found that the open system calls were taking way longer; several 100s
> of milliseconds, sometimes even 1 second.
>
> Investigating this further revealed a problem in the way Swift writes
> the objects on XFS. Swift's object server creates a temp directory
> under the mount point /srv/node/r0. It create an file under this temp
> directory first (say /srv/node/r0/tmp/tmpASDF) and eventually renames
> this file to its final destination.
>
> rename /srv/node/r0/tmp/tmpASDF ->
> /srv/node/r0/objects/312/eef/deadbeef/33453453454323424.data.
>
> XFS creates an inode in the same allocation group as it parent. So,
> when the temp file tmpASDF is created, it goes in the same allocation
> group of "tmp". When the rename happens, only the filesystem metadata
> gets modified. The allocation groups of the inodes don't change.
This part confuses me. If an inode is in the same allocation group as
its parent, then let's say we have a path on the FS of:
objects/757/a94/bd77129a1cae9e32381776e322efca94/1430268763.41931.data
It seems like 1430268763.41931.data would be in the same allocation
group as objects/757/a94/bd77129a1cae9e32381776e322efca94, and
bd77129a1cae9e32381776e322efca94 would be in the same allocation group
as objects/757/a94, and so on. Thus, everything would be in the same
allocation group as the root directory.
This can't be the case, or else there'd be no point to allocation
groups. What am I missing here?
More information about the Openstack
mailing list