[Openstack] Swift sharding across containers
cthier at gmail.com
Thu Oct 10 14:56:49 UTC 2013
I think your observations are fairly spot on. Here are a couple of
1. I wonder if you are maxing out how much your client can push at 128
threads. If you were to increase the number threads (or number of clients)
for the higher container counts, you could get more transactions through.
2. Cloudfiles rate limits PUTs at 100 per second to a single container.
This helps ensure fairly consistent performance to a single container. We
also put our container data on SSD drives to help drive better performance.
So your max theoretical performance is 100xNUM_CONTAINERS PUTs/sec.
3. It would be worthwhile to test even larger containers to test how much
container size affects performance. I don't think your sample size is
Good work though, and keep it up! :)
On Wed, Oct 9, 2013 at 11:11 PM, Shrinand Javadekar <shrinand at maginatics.com
> Thanks Chuck.
> In order to really measure this, I ran some tests on Rackspace; i.e. I got
> a VM on Rackspace and that VM was talking to a Rackspace Cloudfiles-US
> swift cluster. The VM and object store were both in the Chicago region. The
> downside of using a public object store is that I have little idea about
> the configuration of Swift being used. But installing and configuring one's
> own enterprise class Swift cluster is no child's play either (to put it
> mildly :D).
> In the first experiment, 128 threads were continuously trying to write 1
> byte blobs into N containers where N was in (1, 32, 64, 128, 256, 512). The
> experiment ran for 15 minutes. The experiment was run thrice for each N and
> the results below are the average of three runs.
> [image: Inline image 1]
> The number of writes completed in 15 minutes if ~87K for a single
> container, whereas when these writes are sharded across 32 containers, this
> # is ~135K.
> The second experiment was to find out whether Swift becomes slower as the
> number of objects in a container increases. To do this, I measured the time
> it was taking to write blobs in a single container. Here again, I ran the
> experiment three times and the graph below is the average of the three runs.
> [image: Inline image 2]
> If a container has less than 1.6M blobs, the average time to write a blob
> is ~12.58ms whereas if the container has > 1.6M blobs, the average time to
> write a blob is ~13.29ms. The trend definitely seems to be that as number
> of objects increase, the time to write also increases.
> I guess the absolute number may differ depending on factors like memory,
> CPU, disk (SSD's vs rotational) of the servers running swift. But the
> relative numbers give a better picture of the benefits of:
> i) Sharding across containers to increase throughput
> ii) Restricting the number of objects per container
> Let me know if I have missed out on anything or if there are more
> experiments to run that would make Swift #awesome!!
> On Tue, Sep 3, 2013 at 7:47 AM, Chuck Thier <cthier at gmail.com> wrote:
>> Hi Shri,
>> The short answer is that sharding your data across containers in swift is
>> generally a good idea.
>> The limitations with containers has a lot more to do with overall
>> concurrency rather than total objects in a container. The number of
>> objects in a container can have an affect on that, but will be less of an
>> issue if you are not putting objects in at a high concurrency.
>> On Sun, Sep 1, 2013 at 9:39 PM, Shrinand Javadekar <
>> shrinand at maginatics.com> wrote:
>>> There have been several articles which talk about keeping the number of
>>> objects in a container to about 1M. Beyond that sqlite starts becoming the
>>> bottleneck. I am going to make sure we abide by this number.
>>> However, has anyone measured whether putting objects among multiple
>>> containers right from the start gives any performance benefits. For e.g. I
>>> could create 32 containers right at the start and split the objects among
>>> these as I write more and more objects. In the average case, I would have
>>> several partially filled containers instead of a few fully filled ones
>>> (fully filled means having 1M objects). Would this be better for the
>>> overall performance? Any downsides of this approach? Has anyone tried this
>>> before and published numbers on this?
>>> Thanks in advance.
>>> Mailing list:
>>> Post to : openstack at lists.openstack.org
>>> Unsubscribe :
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Openstack