[Openstack] Swift sharding across containers

Shrinand Javadekar shrinand at maginatics.com
Thu Oct 10 04:11:44 UTC 2013

Thanks Chuck.

In order to really measure this, I ran some tests on Rackspace; i.e. I got
a VM on Rackspace and that VM was talking to a Rackspace Cloudfiles-US
swift cluster. The VM and object store were both in the Chicago region. The
downside of using a public object store is that I have little idea about
the configuration of Swift being used. But installing and configuring one's
own enterprise class Swift cluster is no child's play either (to put it
mildly :D).

In the first experiment, 128 threads were continuously trying to write 1
byte blobs into N containers where N was in (1, 32, 64, 128, 256, 512). The
experiment ran for 15 minutes. The experiment was run thrice for each N and
the results below are the average of three runs.

[image: Inline image 1]
The number of writes completed in 15 minutes if ~87K for a single
container, whereas when these writes are sharded across 32 containers, this
# is ~135K.

The second experiment was to find out whether Swift becomes slower as the
number of objects in a container increases. To do this, I measured the time
it was taking to write blobs in a single container. Here again, I ran the
experiment three times and the graph below is the average of the three runs.

[image: Inline image 2]

If a container has less than 1.6M blobs, the average time to write a blob
is ~12.58ms whereas if the container has > 1.6M blobs, the average time to
write a blob is ~13.29ms. The trend definitely seems to be that as number
of objects increase, the time to write also increases.

I guess the absolute number may differ depending on factors like memory,
CPU, disk (SSD's vs rotational) of the servers running swift. But the
relative numbers give a better picture of the benefits of:

i) Sharding across containers to increase throughput
ii) Restricting the number of objects per container

Let me know if I have missed out on anything or if there are more
experiments to run that would make Swift #awesome!!


On Tue, Sep 3, 2013 at 7:47 AM, Chuck Thier <cthier at gmail.com> wrote:

> Hi Shri,
> The short answer is that sharding your data across containers in swift is
> generally a good idea.
> The limitations with containers has a lot more to do with overall
> concurrency rather than total objects in a container.  The number of
> objects in a container can have an affect on that, but will be less of an
> issue if you are not putting objects in at a high concurrency.
> --
> Chuck
> On Sun, Sep 1, 2013 at 9:39 PM, Shrinand Javadekar <
> shrinand at maginatics.com> wrote:
>> Hi,
>> There have been several articles which talk about keeping the number of
>> objects in a container to about 1M. Beyond that sqlite starts becoming the
>> bottleneck. I am going to make sure we abide by this number.
>> However, has anyone measured whether putting objects among multiple
>> containers right from the start gives any performance benefits. For e.g. I
>> could create 32 containers right at the start and split the objects among
>> these as I write more and more objects. In the average case, I would have
>> several partially filled containers instead of a few fully filled ones
>> (fully filled means having 1M objects). Would this be better for the
>> overall performance? Any downsides of this approach? Has anyone tried this
>> before and published numbers on this?
>> Thanks in advance.
>> -Shri
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131009/823cd2a0/attachment.html>

More information about the Openstack mailing list