[Openstack] Swift sharding across containers

Shrinand Javadekar shrinand at maginatics.com
Thu Oct 10 18:31:54 UTC 2013


Thanks for the inputs Chuck. Please see my responses inline.


On Thu, Oct 10, 2013 at 7:56 AM, Chuck Thier <cthier at gmail.com> wrote:

> Hi Shri,
>
> I think your observations are fairly spot on.  Here are a couple of
> thoughts/comments.
>
> 1.  I wonder if you are maxing out how much your client can push at 128
> threads.  If you were to increase the number threads (or number of clients)
> for the higher container counts, you could get more transactions through.
>

[SJ] In this experiment I simply wanted to see how sharding across
containers helps gives the same input data rate. I will run some numbers
with higher number of threads to see what's the max number of operations
per second I can get.


>
> 2.  Cloudfiles rate limits PUTs at 100 per second to a single container.
>  This helps ensure fairly consistent performance to a single container.  We
> also put our container data on SSD drives to help drive better performance.
>  So your max theoretical performance is 100xNUM_CONTAINERS PUTs/sec.
>

[SJ] Great to know about the SSDs and the rate limits. Is it also possible
to know what version of Swift has been deployed at Rackspace Cloudfiles?


> 3.  It would be worthwhile to test even larger containers to test how much
> container size affects performance.  I don't think your sample size is
> large enough.
>

[SJ] Yeah, especially with SSDs, this is definitely not a large enough
sample. I guess, I should start at 1M and go upto 10M or so.

Will keep you'll posted.

-Shri

On Wed, Oct 9, 2013 at 11:11 PM, Shrinand Javadekar <shrinand at maginatics.com
> > wrote:
>
>> Thanks Chuck.
>>
>> In order to really measure this, I ran some tests on Rackspace; i.e. I
>> got a VM on Rackspace and that VM was talking to a Rackspace Cloudfiles-US
>> swift cluster. The VM and object store were both in the Chicago region. The
>> downside of using a public object store is that I have little idea about
>> the configuration of Swift being used. But installing and configuring one's
>> own enterprise class Swift cluster is no child's play either (to put it
>> mildly :D).
>>
>> In the first experiment, 128 threads were continuously trying to write 1
>> byte blobs into N containers where N was in (1, 32, 64, 128, 256, 512). The
>> experiment ran for 15 minutes. The experiment was run thrice for each N and
>> the results below are the average of three runs.
>>
>> [image: Inline image 1]
>> The number of writes completed in 15 minutes if ~87K for a single
>> container, whereas when these writes are sharded across 32 containers, this
>> # is ~135K.
>>
>> The second experiment was to find out whether Swift becomes slower as the
>> number of objects in a container increases. To do this, I measured the time
>> it was taking to write blobs in a single container. Here again, I ran the
>> experiment three times and the graph below is the average of the three runs.
>>
>> [image: Inline image 2]
>>
>> If a container has less than 1.6M blobs, the average time to write a blob
>> is ~12.58ms whereas if the container has > 1.6M blobs, the average time to
>> write a blob is ~13.29ms. The trend definitely seems to be that as number
>> of objects increase, the time to write also increases.
>>
>> I guess the absolute number may differ depending on factors like memory,
>> CPU, disk (SSD's vs rotational) of the servers running swift. But the
>> relative numbers give a better picture of the benefits of:
>>
>> i) Sharding across containers to increase throughput
>> ii) Restricting the number of objects per container
>>
>> Let me know if I have missed out on anything or if there are more
>> experiments to run that would make Swift #awesome!!
>>
>> -Shri
>>
>>
>>
>> On Tue, Sep 3, 2013 at 7:47 AM, Chuck Thier <cthier at gmail.com> wrote:
>>
>>> Hi Shri,
>>>
>>> The short answer is that sharding your data across containers in swift
>>> is generally a good idea.
>>>
>>> The limitations with containers has a lot more to do with overall
>>> concurrency rather than total objects in a container.  The number of
>>> objects in a container can have an affect on that, but will be less of an
>>> issue if you are not putting objects in at a high concurrency.
>>>
>>> --
>>> Chuck
>>>
>>>
>>> On Sun, Sep 1, 2013 at 9:39 PM, Shrinand Javadekar <
>>> shrinand at maginatics.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> There have been several articles which talk about keeping the number of
>>>> objects in a container to about 1M. Beyond that sqlite starts becoming the
>>>> bottleneck. I am going to make sure we abide by this number.
>>>>
>>>> However, has anyone measured whether putting objects among multiple
>>>> containers right from the start gives any performance benefits. For e.g. I
>>>> could create 32 containers right at the start and split the objects among
>>>> these as I write more and more objects. In the average case, I would have
>>>> several partially filled containers instead of a few fully filled ones
>>>> (fully filled means having 1M objects). Would this be better for the
>>>> overall performance? Any downsides of this approach? Has anyone tried this
>>>> before and published numbers on this?
>>>>
>>>> Thanks in advance.
>>>> -Shri
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list:
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>> Post to     : openstack at lists.openstack.org
>>>> Unsubscribe :
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131010/5e963a6a/attachment.html>


More information about the Openstack mailing list