<div dir="ltr"><font color="#000000"><font><font face="courier new,monospace">Chuck et All.</font></font></font><div><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div><div>

<font color="#000000"><font><font face="courier new,monospace">Let me go through the point one by one.</font></font></font></div><div><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div>


<div><font color="#000000"><font><font face="courier new,monospace">#1 Even seeing that "object-auditor" allways runs and never stops, we stoped the swift-*-auditor and didnt see any improvements, from all the datanodes we have an average of 8% IO-WAIT (using iostat), the only thing that we see is the pid "xfsbuf" runs once in a while causing 99% iowait for a sec, we delayed the runtime for that process, and didnt see changes either.</font></font></font></div>

<div><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div><div style><font color="#000000"><font><font face="courier new,monospace">Our object-auditor config for all devices is as follow :</font></font></font></div>

<div style><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div><div style><font color="#000000"><font><font face="courier new,monospace"><div>[object-auditor]</div><div>files_per_second = 5</div>

<div>zero_byte_files_per_second = 5</div><div>bytes_per_second = 3000000</div></font></font></font></div><div><font color="#000000"><font><font face="courier new,monospace"> </font></font></font></div>

<div><font color="#000000"><font><font face="courier new,monospace">#2 Our 12 proxyes are 6 physical and 6 kvm instances running on nova, checking iftop we are at an average of 15Mb/s of bandwidth usage so i dont think we are saturating the networking.</font></font></font></div>


<div style><font color="#000000"><font><font face="courier new,monospace">#3 The overall Idle CPU on all datanodes is 80%, im not sure how to check the CPU usage per worker, let me paste the config for a device for object, account and container.</font></font></font></div>

<div style><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div><div style><font color="#000000"><font><font face="courier new,monospace"><b>object-server.conf</b></font></font></font></div>

<div style><font color="#000000"><font><font face="courier new,monospace"><b>------------------</b></font></font></font></div><div style><font color="#000000" face="courier new, monospace">[DEFAULT]</font></div><div><font color="#000000" face="courier new, monospace">devices = /srv/node/sda3</font></div>

<div><font color="#000000" face="courier new, monospace">mount_check = false</font></div><div><font color="#000000" face="courier new, monospace">bind_port = 6010</font></div><div><font color="#000000" face="courier new, monospace">user = swift</font></div>

<div><font color="#000000" face="courier new, monospace">log_facility = LOG_LOCAL2</font></div><div><font color="#000000" face="courier new, monospace">log_level = DEBUG</font></div><div><font color="#000000" face="courier new, monospace">workers = 48</font></div>

<div><span style="color:rgb(0,0,0);font-family:'courier new',monospace">disable_fallocate = true</span><br></div><div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[pipeline:main]</font></div>

<div><font color="#000000" face="courier new, monospace">pipeline = object-server</font></div><div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[app:object-server]</font></div>

<div><font color="#000000" face="courier new, monospace">use = egg:swift#object</font></div><div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[object-replicator]</font></div>

<div><font color="#000000" face="courier new, monospace">vm_test_mode = yes</font></div><div><font color="#000000" face="courier new, monospace">concurrency = 8</font></div><div><font color="#000000" face="courier new, monospace">run_pause = 600</font></div>

<div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[object-updater]</font></div><div><font color="#000000" face="courier new, monospace">concurrency = 8</font></div>

<div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[object-auditor]</font></div><div><font color="#000000" face="courier new, monospace">files_per_second = 5</font></div>

<div><font color="#000000" face="courier new, monospace">zero_byte_files_per_second = 5</font></div><div><font color="#000000" face="courier new, monospace">bytes_per_second = 3000000</font></div><div><br></div><div><div>

<font color="#000000"><font><font face="courier new,monospace"><b>account-server.conf</b></font></font></font></div><div><font color="#000000"><font><font face="courier new,monospace"><b>-------------------</b></font></font></font></div>

</div><div><font color="#000000"><font><font face="courier new,monospace"><div>[DEFAULT]</div><div>devices = /srv/node/sda3</div><div>mount_check = false</div><div>bind_port = 6012</div><div>user = swift</div><div>log_facility = LOG_LOCAL2</div>

<div>log_level = DEBUG</div><div>workers = 48</div><div>db_preallocation = on</div><div>disable_fallocate = true</div><div> </div><div>[pipeline:main]</div><div>pipeline = account-server</div><div> </div><div>[app:account-server]</div>

<div>use = egg:swift#account</div><div> </div><div>[account-replicator]</div><div>vm_test_mode = yes</div><div>concurrency = 8</div><div>run_pause = 600</div><div> </div><div>[account-auditor]</div><div> </div><div>[account-reaper]</div>

<div><br></div><div><div style="color:rgb(34,34,34);font-family:arial"><font color="#000000"><font><font face="courier new,monospace"><b>container-server.conf</b></font></font></font></div><div style="color:rgb(34,34,34);font-family:arial">

<font color="#000000"><font><font face="courier new,monospace"><b>---------------------</b></font></font></font></div></div><div><font color="#000000"><font><font face="courier new,monospace"><div>[DEFAULT]</div><div>devices = /srv/node/sda3</div>

<div>mount_check = false</div><div>bind_port = 6011</div><div>user = swift</div><div>workers = 48</div><div>log_facility = LOG_LOCAL2</div><div>allow_versions = True</div><div>disable_fallocate = true</div><div> </div><div>

[pipeline:main]</div><div>pipeline = container-server</div><div> </div><div>[app:container-server]</div><div>use = egg:swift#container</div><div>allow_versions = True</div><div> </div><div>[container-replicator]</div><div>

vm_test_mode = yes</div><div>concurrency = 8</div><div>run_pause = 500</div><div> </div><div>[container-updater]</div><div>concurrency = 8</div><div><br></div><div>[container-auditor]</div><div> </div><div style>#4 We dont use SSL for swift so, no latency over there.<br>

</div><div style><br></div><div style>Hope you guys can shed some light.</div></font></font></font></div></font></font></font></div><div><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div>

</div><div class="gmail_extra"><br clear="all"><div><div><font><b><br></b></font></div><div><font><b><img src="http://s14.postimage.org/sg1lztqep/cloudbuilders_Logo_last_small.png" width="96" height="58"><br></b></font></div>

<font><b>Alejandro Comisario <br>

            #melicloud CloudBuilders</b></font><br>

      <font color="#666666"><span style="font-size:6pt;color:gray" lang="ES">Arias 3751, Piso 7 (C1430CRG) <br>

          Ciudad de Buenos Aires -

          Argentina<br>

          Cel: +549(11) 15-3770-1857<br>

          Tel : +54(11) 4640-8443</span></font></div>

<br><br><div class="gmail_quote">On Mon, Jan 14, 2013 at 1:23 PM, Chuck Thier <span dir="ltr"><<a href="mailto:cthier@gmail.com" target="_blank">cthier@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Alejandro,<br>

<br>

I really doubt that partition size is causing these issues.  It can be<br>

difficult to debug these types of issues without access to the<br>

cluster, but I can think of a couple of things to look at.<br>

<br>

1.  Check your disk io usage and io wait on the storage nodes.  If<br>

that seems abnormally high, then that could be one of the sources of<br>

problems.  If this is the case, then the first things that I would<br>

look at are the auditors, as they can use up a lot of disk io if not<br>

properly configured.  I would try turning them off for a bit<br>

(swift-*-auditor) and see if that makes any difference.<br>

<br>

2.  Check your network io usage.  You haven't described what type of<br>

network you have going to the proxies, but if they share a single GigE<br>

interface, if my quick calculations are correct, you could be<br>

saturating the network.<br>

<br>

3.  Check your CPU usage.  I listed this one last as you have said<br>

that you have already worked at tuning the number of workers (though I<br>

would be interested to hear how many workers you have running for each<br>

service).  The main thing to look for, is to see if all of your<br>

workers are maxed out on CPU, if so, then you may need to bump<br>

workers.<br>

<br>

4.  SSL Termination?  Where are you terminating the SSL connection?<br>

If you are terminating SSL in Swift directly with the swift proxy,<br>

then that could also be a source of issue.  This was only meant for<br>

dev and testing, and you should use an SSL terminating load balancer<br>

in front of the swift proxies.<br>

<br>

That's what I could think of right off the top of my head.<br>

<br>

--<br>

Chuck<br>

<div class="HOEnZb"><div class="h5"><br>

On Mon, Jan 14, 2013 at 5:45 AM, Alejandro Comisario<br>

<<a href="mailto:alejandro.comisario@mercadolibre.com">alejandro.comisario@mercadolibre.com</a>> wrote:<br>

> Chuck / John.<br>

> We are having 50.000 request per minute ( where 10.000+ are put from small<br>

> objects, from 10KB to 150KB )<br>

><br>

> We are using swift 1.7.4 with keystone token caching so no latency over<br>

> there.<br>

> We are having 12 proxyes and 24 datanodes divided in 4 zones ( each datanode<br>

> has 48gb of ram, 2 hexacore and 4 devices of 3TB each )<br>

><br>

> The workers that are puting objects in swift are seeing an awful<br>

> performance, and we too.<br>

> With peaks of 2secs to 15secs per put operations coming from the datanodes.<br>

> We tunes db_preallocation, disable_fallocate, workers and concurrency but we<br>

> cant reach the request that we need ( we need 24.000 put per minute of small<br>

> objects ) but we dont seem to find where is the problem, other than from the<br>

> datanodes.<br>

><br>

> Maybe worth pasting our config over here?<br>

> Thanks in advance.<br>

><br>

> alejandro<br>

><br>

> On 12 Jan 2013 02:01, "Chuck Thier" <<a href="mailto:cthier@gmail.com">cthier@gmail.com</a>> wrote:<br>

>><br>

>> Looking at this from a different perspective.  Having 2500 partitions<br>

>> per drive shouldn't be an absolutely horrible thing either.  Do you<br>

>> know how many objects you have per partition?  What types of problems<br>

>> are you seeing?<br>

>><br>

>> --<br>

>> Chuck<br>

>><br>

>> On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson <<a href="mailto:me@not.mn">me@not.mn</a>> wrote:<br>

>> > If effect, this would be a complete replacement of your rings, and that<br>

>> > is essentially a whole new cluster. All of the existing data would need to<br>

>> > be rehashed into the new ring before it is available.<br>

>> ><br>

>> > There is no process that rehashes the data to ensure that it is still in<br>

>> > the correct partition. Replication only ensures that the partitions are on<br>

>> > the right drives.<br>

>> ><br>

>> > To change the number of partitions, you will need to GET all of the data<br>

>> > from the old ring and PUT it to the new ring. A more complicated, but<br>

>> > perhaps more efficient) solution may include something like walking each<br>

>> > drive and rehashing+moving the data to the right partition and then letting<br>

>> > replication settle it down.<br>

>> ><br>

>> > Either way, 100% of your existing data will need to at least be rehashed<br>

>> > (and probably moved). Your CPU (hashing), disks (read+write), RAM (directory<br>

>> > walking), and network (replication) may all be limiting factors in how long<br>

>> > it will take to do this. Your per-disk free space may also determine what<br>

>> > method you choose.<br>

>> ><br>

>> > I would not expect any data loss while doing this, but you will probably<br>

>> > have availability issues, depending on the data access patterns.<br>

>> ><br>

>> > I'd like to eventually see something in swift that allows for changing<br>

>> > the partition power in existing rings, but that will be<br>

>> > hard/tricky/non-trivial.<br>

>> ><br>

>> > Good luck.<br>

>> ><br>

>> > --John<br>

>> ><br>

>> ><br>

>> > On Jan 11, 2013, at 1:17 PM, Alejandro Comisario<br>

>> > <<a href="mailto:alejandro.comisario@mercadolibre.com">alejandro.comisario@mercadolibre.com</a>> wrote:<br>

>> ><br>

>> >> Hi guys.<br>

>> >> We've created a swift cluster several months ago, the things is that<br>

>> >> righ now we cant add hardware and we configured lots of partitions thinking<br>

>> >> about the final picture of the cluster.<br>

>> >><br>

>> >> Today each datanodes is having 2500+ partitions per device, and even<br>

>> >> tuning the background processes ( replicator, auditor & updater ) we really<br>

>> >> want to try to lower the partition power.<br>

>> >><br>

>> >> Since its not possible to do that without recreating the ring, we can<br>

>> >> have the luxury of recreate it with a very lower partition power, and<br>

>> >> rebalance / deploy the new ring.<br>

>> >><br>

>> >> The question is, having a working cluster with *existing data* is it<br>

>> >> possible to do this and wait for the data to move around *without data loss*<br>

>> >> ???<br>

>> >> If so, it might be true to wait for an improvement in the overall<br>

>> >> cluster performance ?<br>

>> >><br>

>> >> We have no problem to have a non working cluster (while moving the<br>

>> >> data) even for an entire weekend.<br>

>> >><br>

>> >> Cheers.<br>

>> >><br>

>> >><br>

>> ><br>

>> ><br>

>> > _______________________________________________<br>

>> > Mailing list: <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

>> > Post to     : <a href="mailto:openstack@lists.launchpad.net">openstack@lists.launchpad.net</a><br>

>> > Unsubscribe : <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>

>> > More help   : <a href="https://help.launchpad.net/ListHelp" target="_blank">https://help.launchpad.net/ListHelp</a><br>

>> ><br>

</div></div></blockquote></div><br></div>