[Openstack] Swift sharding across containers
shrinand at maginatics.com
Thu Oct 10 04:11:44 UTC 2013
In order to really measure this, I ran some tests on Rackspace; i.e. I got
a VM on Rackspace and that VM was talking to a Rackspace Cloudfiles-US
swift cluster. The VM and object store were both in the Chicago region. The
downside of using a public object store is that I have little idea about
the configuration of Swift being used. But installing and configuring one's
own enterprise class Swift cluster is no child's play either (to put it
In the first experiment, 128 threads were continuously trying to write 1
byte blobs into N containers where N was in (1, 32, 64, 128, 256, 512). The
experiment ran for 15 minutes. The experiment was run thrice for each N and
the results below are the average of three runs.
[image: Inline image 1]
The number of writes completed in 15 minutes if ~87K for a single
container, whereas when these writes are sharded across 32 containers, this
# is ~135K.
The second experiment was to find out whether Swift becomes slower as the
number of objects in a container increases. To do this, I measured the time
it was taking to write blobs in a single container. Here again, I ran the
experiment three times and the graph below is the average of the three runs.
[image: Inline image 2]
If a container has less than 1.6M blobs, the average time to write a blob
is ~12.58ms whereas if the container has > 1.6M blobs, the average time to
write a blob is ~13.29ms. The trend definitely seems to be that as number
of objects increase, the time to write also increases.
I guess the absolute number may differ depending on factors like memory,
CPU, disk (SSD's vs rotational) of the servers running swift. But the
relative numbers give a better picture of the benefits of:
i) Sharding across containers to increase throughput
ii) Restricting the number of objects per container
Let me know if I have missed out on anything or if there are more
experiments to run that would make Swift #awesome!!
On Tue, Sep 3, 2013 at 7:47 AM, Chuck Thier <cthier at gmail.com> wrote:
> Hi Shri,
> The short answer is that sharding your data across containers in swift is
> generally a good idea.
> The limitations with containers has a lot more to do with overall
> concurrency rather than total objects in a container. The number of
> objects in a container can have an affect on that, but will be less of an
> issue if you are not putting objects in at a high concurrency.
> On Sun, Sep 1, 2013 at 9:39 PM, Shrinand Javadekar <
> shrinand at maginatics.com> wrote:
>> There have been several articles which talk about keeping the number of
>> objects in a container to about 1M. Beyond that sqlite starts becoming the
>> bottleneck. I am going to make sure we abide by this number.
>> However, has anyone measured whether putting objects among multiple
>> containers right from the start gives any performance benefits. For e.g. I
>> could create 32 containers right at the start and split the objects among
>> these as I write more and more objects. In the average case, I would have
>> several partially filled containers instead of a few fully filled ones
>> (fully filled means having 1M objects). Would this be better for the
>> overall performance? Any downsides of this approach? Has anyone tried this
>> before and published numbers on this?
>> Thanks in advance.
>> Mailing list:
>> Post to : openstack at lists.openstack.org
>> Unsubscribe :
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openstack