<p dir="ltr">Chuck / John.<br>

We are having 50.000 request per minute ( where 10.000+ are put from small objects, from 10KB to 150KB )</p>

<p dir="ltr">We are using swift 1.7.4 with keystone token caching so no latency over there.<br>

We are having 12 proxyes and 24 datanodes divided in 4 zones ( each datanode has 48gb of ram, 2 hexacore and 4 devices of 3TB each )</p>

<p dir="ltr">The workers that are puting objects in swift are seeing an awful performance, and we too.<br>

With peaks of 2secs to 15secs per put operations coming from the datanodes.<br>

We tunes db_preallocation, disable_fallocate, workers and concurrency but we cant reach the request that we need ( we need 24.000 put per minute of small objects ) but we dont seem to find where is the problem, other than from the datanodes.</p>


<p dir="ltr">Maybe worth pasting our config over here?<br>

Thanks in advance.</p>

<p dir="ltr">alejandro</p>

<div class="gmail_quote">On 12 Jan 2013 02:01, "Chuck Thier" <<a href="mailto:cthier@gmail.com">cthier@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Looking at this from a different perspective.  Having 2500 partitions<br>

per drive shouldn't be an absolutely horrible thing either.  Do you<br>

know how many objects you have per partition?  What types of problems<br>

are you seeing?<br>

<br>

--<br>

Chuck<br>

<br>

On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson <<a href="mailto:me@not.mn">me@not.mn</a>> wrote:<br>

> If effect, this would be a complete replacement of your rings, and that is essentially a whole new cluster. All of the existing data would need to be rehashed into the new ring before it is available.<br>

><br>

> There is no process that rehashes the data to ensure that it is still in the correct partition. Replication only ensures that the partitions are on the right drives.<br>

><br>

> To change the number of partitions, you will need to GET all of the data from the old ring and PUT it to the new ring. A more complicated, but perhaps more efficient) solution may include something like walking each drive and rehashing+moving the data to the right partition and then letting replication settle it down.<br>


><br>

> Either way, 100% of your existing data will need to at least be rehashed (and probably moved). Your CPU (hashing), disks (read+write), RAM (directory walking), and network (replication) may all be limiting factors in how long it will take to do this. Your per-disk free space may also determine what method you choose.<br>


><br>

> I would not expect any data loss while doing this, but you will probably have availability issues, depending on the data access patterns.<br>

><br>

> I'd like to eventually see something in swift that allows for changing the partition power in existing rings, but that will be hard/tricky/non-trivial.<br>

><br>

> Good luck.<br>

><br>

> --John<br>

><br>

><br>

> On Jan 11, 2013, at 1:17 PM, Alejandro Comisario <<a href="mailto:alejandro.comisario@mercadolibre.com">alejandro.comisario@mercadolibre.com</a>> wrote:<br>

><br>

>> Hi guys.<br>

>> We've created a swift cluster several months ago, the things is that righ now we cant add hardware and we configured lots of partitions thinking about the final picture of the cluster.<br>

>><br>

>> Today each datanodes is having 2500+ partitions per device, and even tuning the background processes ( replicator, auditor & updater ) we really want to try to lower the partition power.<br>

>><br>

>> Since its not possible to do that without recreating the ring, we can have the luxury of recreate it with a very lower partition power, and rebalance / deploy the new ring.<br>

>><br>

>> The question is, having a working cluster with *existing data* is it possible to do this and wait for the data to move around *without data loss* ???<br>

>> If so, it might be true to wait for an improvement in the overall cluster performance ?<br>

>><br>

>> We have no problem to have a non working cluster (while moving the data) even for an entire weekend.<br>

>><br>

>> Cheers.<br>

>><br>

>><br>

><br>

><br>

> _______________________________________________<br>

> Mailing list: <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

> Post to     : <a href="mailto:openstack@lists.launchpad.net">openstack@lists.launchpad.net</a><br>

> Unsubscribe : <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

> More help   : <a href="https://help.launchpad.net/ListHelp" target="_blank">https://help.launchpad.net/ListHelp</a><br>

><br>

</blockquote></div>