[Openstack] [SWIFT] Why is sharding across containers such a big deal?

Shrinand Javadekar shrinand at maginatics.com
Thu Dec 5 00:55:26 UTC 2013


I had similar questions earlier. You might find [1] useful.

As John mentioned, under highly concurrent workloads the container
itself can become a bottleneck and sharding across containers can help
speed things up. Also, as containers grow in size, the sqlite database
keeping information about the objects in that container grows in size
and each write might start become slower. Therefore it's a good idea
to also "shard vertically" and restrict the number of objects in a
container. The recommended count was 1M if the swift cluster is
running on rotational disks. It's higher for SSDs but I don't know if
there are experiments that suggest a good number.

Unfortunately, the link above doesn't have the images of my
experiments. So I'm attaching them again. Note that the experiment
that tried to find out how slow can each object write get as number of
blobs increases could've been done at a bigger scale. I only pumped
~3M blobs. The Swift cluster in Rackspace was using SSDs.

Hope that helps.
-Shri

[1] https://www.mail-archive.com/openstack@lists.openstack.org/msg01760.html

On Wed, Dec 4, 2013 at 3:15 PM, John Dickinson <me at not.mn> wrote:
> correct. a single container is replicated like other data in the system (typically 3x). This means that a single container is on only 3 spindles, and an nymber of concurrent writes to objects in that container will attempt to update the container listing (with graceful failure handling). This means that under significant concurrency, the concurrent object write speed is limited by the time it takes to update one of those container replicas.
>
> There are two easy "fixes" for this: (1) shard your data across containers and (2) use faster, dedicated drives for the containers (eg SSDs).
>
> The hard fix for this is to implement container sharding within swift, but this is a hard problem to solve (although nobody would be opposed to a successful solution).
>
> --John
>
>
>
>
>
> On Dec 4, 2013, at 3:01 PM, Stephen Wood <smwood4 at gmail.com> wrote:
>
>> Can someone explain to me (or point me to some good literature) about why sharding across containers is such a big deal in terms of performance? Is it that a single container is typically localized across a small number of shards?
>>
>> --
>> Stephen Wood
>> www.heystephenwood.com
>> _______________________________________________
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: average_blob_write_time.png
Type: image/png
Size: 14052 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131204/f72c32a6/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sharding_across_containers_new.png
Type: image/png
Size: 9326 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131204/f72c32a6/attachment-0001.png>


More information about the Openstack mailing list