[Openstack] [Swift] Directory structure during data placement

John Dickinson me at not.mn
Fri Apr 24 15:58:10 UTC 2015


> On Apr 24, 2015, at 12:42 AM, Shrinand Javadekar <shrinand at maginatics.com> wrote:
> 
> Hi,
> 
> I observe that while placing data, the object server creates a
> directory structure:
> 
> /srv/node/r0/objects/<partition>/<3 byte hash suffix>/<hash>/<timestamp>.data.
> 
> Is there a reason for the <hash> directory to be created? Couldn't
> this just have been
> /srv/node/r0/objects/<partition>/<3 byte hash suffix>/<hash>.data?

Let's explore that idea. First, the general concept is sound. But let's explore the implications.

Suppose we did away with the hash dir and just had has.data. Then, each hash suffix directory will end up with an enormous amount of directories in it. This itself can cause issues in file systems. In fact, this is exactly why we have the hash suffix directory: to prevent the cardinality of the partition directory from becoming so large. So just doing away with the hash directory could cause some problems for the system as more and more objects get added (doing a listdir on a directory with a lot of files in it is _extremely_ slow).

But there's more than just splaying. .data files are not the only thing that can be stored about an object. There are also .meta files (which are used for "fast-post" but not enabled by default). These files store metadata, as you would assume. There are also .ts files (for "tombstone") that identify when an object has been deleted. And in the new erasure code storage policy, there is a new .durable file that marks when the system has an object that is durable in the cluster.

Also, each of these files are named according to a timestamp. So if we did away with the hash directory and instead put it all in the hash suffix, then we'd have to name files like <hash>.<timestamp>.data or something so that we can get ordering. Aside from the listdir issues mentioned above, that sort of filtering would also be expensive to sort and group all the files so that concurrent operations can be resolved. Therefore we've put all the things (ie files) associated with an object into its own directory.

Of course, the cost for the deeper directory structure is that there are more inodes and dentries in the filesystem.

All that being said, I think that the combined features of DiskFiles and Storage Policies should allow for some interesting experimentation of the on-disk layout. I'm sure there are optimizations, especially if you have any foreknowledge of the kind of data being stored. For example, small files could take advantage of some sort of Haystack-style slab storage. These ideas are simply that right now--ideas not implementations. But I'd love to see some R&D in these areas.


Hope this helps explain why the on-disk layout is the way it is.


--John





> 
> I am seeing a situation where after writing a few hundred Gigs worth
> of data, where each object is 256K, XFS metadata performance is
> deteriorating. Having less number of directories might help in that
> case.
> 
> -Shri
> 
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20150424/a25390c7/attachment.sig>


More information about the Openstack mailing list