<div dir="ltr"><font color="#000000"><font><font face="courier new,monospace">Chuck et All.</font></font></font><div><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div><div>
<font color="#000000"><font><font face="courier new,monospace">Let me go through the point one by one.</font></font></font></div><div><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div>
<div><font color="#000000"><font><font face="courier new,monospace">#1 Even seeing that "object-auditor" allways runs and never stops, we stoped the swift-*-auditor and didnt see any improvements, from all the datanodes we have an average of 8% IO-WAIT (using iostat), the only thing that we see is the pid "xfsbuf" runs once in a while causing 99% iowait for a sec, we delayed the runtime for that process, and didnt see changes either.</font></font></font></div>
<div><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div><div style><font color="#000000"><font><font face="courier new,monospace">Our object-auditor config for all devices is as follow :</font></font></font></div>
<div style><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div><div style><font color="#000000"><font><font face="courier new,monospace"><div>[object-auditor]</div><div>files_per_second = 5</div>
<div>zero_byte_files_per_second = 5</div><div>bytes_per_second = 3000000</div></font></font></font></div><div><font color="#000000"><font><font face="courier new,monospace"> </font></font></font></div>
<div><font color="#000000"><font><font face="courier new,monospace">#2 Our 12 proxyes are 6 physical and 6 kvm instances running on nova, checking iftop we are at an average of 15Mb/s of bandwidth usage so i dont think we are saturating the networking.</font></font></font></div>
<div style><font color="#000000"><font><font face="courier new,monospace">#3 The overall Idle CPU on all datanodes is 80%, im not sure how to check the CPU usage per worker, let me paste the config for a device for object, account and container.</font></font></font></div>
<div style><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div><div style><font color="#000000"><font><font face="courier new,monospace"><b>object-server.conf</b></font></font></font></div>
<div style><font color="#000000"><font><font face="courier new,monospace"><b>------------------</b></font></font></font></div><div style><font color="#000000" face="courier new, monospace">[DEFAULT]</font></div><div><font color="#000000" face="courier new, monospace">devices = /srv/node/sda3</font></div>
<div><font color="#000000" face="courier new, monospace">mount_check = false</font></div><div><font color="#000000" face="courier new, monospace">bind_port = 6010</font></div><div><font color="#000000" face="courier new, monospace">user = swift</font></div>
<div><font color="#000000" face="courier new, monospace">log_facility = LOG_LOCAL2</font></div><div><font color="#000000" face="courier new, monospace">log_level = DEBUG</font></div><div><font color="#000000" face="courier new, monospace">workers = 48</font></div>
<div><span style="color:rgb(0,0,0);font-family:'courier new',monospace">disable_fallocate = true</span><br></div><div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[pipeline:main]</font></div>
<div><font color="#000000" face="courier new, monospace">pipeline = object-server</font></div><div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[app:object-server]</font></div>
<div><font color="#000000" face="courier new, monospace">use = egg:swift#object</font></div><div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[object-replicator]</font></div>
<div><font color="#000000" face="courier new, monospace">vm_test_mode = yes</font></div><div><font color="#000000" face="courier new, monospace">concurrency = 8</font></div><div><font color="#000000" face="courier new, monospace">run_pause = 600</font></div>
<div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[object-updater]</font></div><div><font color="#000000" face="courier new, monospace">concurrency = 8</font></div>
<div><font color="#000000" face="courier new, monospace"> </font></div><div><font color="#000000" face="courier new, monospace">[object-auditor]</font></div><div><font color="#000000" face="courier new, monospace">files_per_second = 5</font></div>
<div><font color="#000000" face="courier new, monospace">zero_byte_files_per_second = 5</font></div><div><font color="#000000" face="courier new, monospace">bytes_per_second = 3000000</font></div><div><br></div><div><div>
<font color="#000000"><font><font face="courier new,monospace"><b>account-server.conf</b></font></font></font></div><div><font color="#000000"><font><font face="courier new,monospace"><b>-------------------</b></font></font></font></div>
</div><div><font color="#000000"><font><font face="courier new,monospace"><div>[DEFAULT]</div><div>devices = /srv/node/sda3</div><div>mount_check = false</div><div>bind_port = 6012</div><div>user = swift</div><div>log_facility = LOG_LOCAL2</div>
<div>log_level = DEBUG</div><div>workers = 48</div><div>db_preallocation = on</div><div>disable_fallocate = true</div><div> </div><div>[pipeline:main]</div><div>pipeline = account-server</div><div> </div><div>[app:account-server]</div>
<div>use = egg:swift#account</div><div> </div><div>[account-replicator]</div><div>vm_test_mode = yes</div><div>concurrency = 8</div><div>run_pause = 600</div><div> </div><div>[account-auditor]</div><div> </div><div>[account-reaper]</div>
<div><br></div><div><div style="color:rgb(34,34,34);font-family:arial"><font color="#000000"><font><font face="courier new,monospace"><b>container-server.conf</b></font></font></font></div><div style="color:rgb(34,34,34);font-family:arial">
<font color="#000000"><font><font face="courier new,monospace"><b>---------------------</b></font></font></font></div></div><div><font color="#000000"><font><font face="courier new,monospace"><div>[DEFAULT]</div><div>devices = /srv/node/sda3</div>
<div>mount_check = false</div><div>bind_port = 6011</div><div>user = swift</div><div>workers = 48</div><div>log_facility = LOG_LOCAL2</div><div>allow_versions = True</div><div>disable_fallocate = true</div><div> </div><div>
[pipeline:main]</div><div>pipeline = container-server</div><div> </div><div>[app:container-server]</div><div>use = egg:swift#container</div><div>allow_versions = True</div><div> </div><div>[container-replicator]</div><div>
vm_test_mode = yes</div><div>concurrency = 8</div><div>run_pause = 500</div><div> </div><div>[container-updater]</div><div>concurrency = 8</div><div><br></div><div>[container-auditor]</div><div> </div><div style>#4 We dont use SSL for swift so, no latency over there.<br>
</div><div style><br></div><div style>Hope you guys can shed some light.</div></font></font></font></div></font></font></font></div><div><font color="#000000"><font><font face="courier new,monospace"><br></font></font></font></div>
</div><div class="gmail_extra"><br clear="all"><div><div><font><b><br></b></font></div><div><font><b><img src="http://s14.postimage.org/sg1lztqep/cloudbuilders_Logo_last_small.png" width="96" height="58"><br></b></font></div>
<font><b>Alejandro Comisario <br>
#melicloud CloudBuilders</b></font><br>
<font color="#666666"><span style="font-size:6pt;color:gray" lang="ES">Arias 3751, Piso 7 (C1430CRG) <br>
Ciudad de Buenos Aires -
Argentina<br>
Cel: +549(11) 15-3770-1857<br>
Tel : +54(11) 4640-8443</span></font></div>
<br><br><div class="gmail_quote">On Mon, Jan 14, 2013 at 1:23 PM, Chuck Thier <span dir="ltr"><<a href="mailto:cthier@gmail.com" target="_blank">cthier@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Alejandro,<br>
<br>
I really doubt that partition size is causing these issues. It can be<br>
difficult to debug these types of issues without access to the<br>
cluster, but I can think of a couple of things to look at.<br>
<br>
1. Check your disk io usage and io wait on the storage nodes. If<br>
that seems abnormally high, then that could be one of the sources of<br>
problems. If this is the case, then the first things that I would<br>
look at are the auditors, as they can use up a lot of disk io if not<br>
properly configured. I would try turning them off for a bit<br>
(swift-*-auditor) and see if that makes any difference.<br>
<br>
2. Check your network io usage. You haven't described what type of<br>
network you have going to the proxies, but if they share a single GigE<br>
interface, if my quick calculations are correct, you could be<br>
saturating the network.<br>
<br>
3. Check your CPU usage. I listed this one last as you have said<br>
that you have already worked at tuning the number of workers (though I<br>
would be interested to hear how many workers you have running for each<br>
service). The main thing to look for, is to see if all of your<br>
workers are maxed out on CPU, if so, then you may need to bump<br>
workers.<br>
<br>
4. SSL Termination? Where are you terminating the SSL connection?<br>
If you are terminating SSL in Swift directly with the swift proxy,<br>
then that could also be a source of issue. This was only meant for<br>
dev and testing, and you should use an SSL terminating load balancer<br>
in front of the swift proxies.<br>
<br>
That's what I could think of right off the top of my head.<br>
<br>
--<br>
Chuck<br>
<div class="HOEnZb"><div class="h5"><br>
On Mon, Jan 14, 2013 at 5:45 AM, Alejandro Comisario<br>
<<a href="mailto:alejandro.comisario@mercadolibre.com">alejandro.comisario@mercadolibre.com</a>> wrote:<br>
> Chuck / John.<br>
> We are having 50.000 request per minute ( where 10.000+ are put from small<br>
> objects, from 10KB to 150KB )<br>
><br>
> We are using swift 1.7.4 with keystone token caching so no latency over<br>
> there.<br>
> We are having 12 proxyes and 24 datanodes divided in 4 zones ( each datanode<br>
> has 48gb of ram, 2 hexacore and 4 devices of 3TB each )<br>
><br>
> The workers that are puting objects in swift are seeing an awful<br>
> performance, and we too.<br>
> With peaks of 2secs to 15secs per put operations coming from the datanodes.<br>
> We tunes db_preallocation, disable_fallocate, workers and concurrency but we<br>
> cant reach the request that we need ( we need 24.000 put per minute of small<br>
> objects ) but we dont seem to find where is the problem, other than from the<br>
> datanodes.<br>
><br>
> Maybe worth pasting our config over here?<br>
> Thanks in advance.<br>
><br>
> alejandro<br>
><br>
> On 12 Jan 2013 02:01, "Chuck Thier" <<a href="mailto:cthier@gmail.com">cthier@gmail.com</a>> wrote:<br>
>><br>
>> Looking at this from a different perspective. Having 2500 partitions<br>
>> per drive shouldn't be an absolutely horrible thing either. Do you<br>
>> know how many objects you have per partition? What types of problems<br>
>> are you seeing?<br>
>><br>
>> --<br>
>> Chuck<br>
>><br>
>> On Fri, Jan 11, 2013 at 3:28 PM, John Dickinson <<a href="mailto:me@not.mn">me@not.mn</a>> wrote:<br>
>> > If effect, this would be a complete replacement of your rings, and that<br>
>> > is essentially a whole new cluster. All of the existing data would need to<br>
>> > be rehashed into the new ring before it is available.<br>
>> ><br>
>> > There is no process that rehashes the data to ensure that it is still in<br>
>> > the correct partition. Replication only ensures that the partitions are on<br>
>> > the right drives.<br>
>> ><br>
>> > To change the number of partitions, you will need to GET all of the data<br>
>> > from the old ring and PUT it to the new ring. A more complicated, but<br>
>> > perhaps more efficient) solution may include something like walking each<br>
>> > drive and rehashing+moving the data to the right partition and then letting<br>
>> > replication settle it down.<br>
>> ><br>
>> > Either way, 100% of your existing data will need to at least be rehashed<br>
>> > (and probably moved). Your CPU (hashing), disks (read+write), RAM (directory<br>
>> > walking), and network (replication) may all be limiting factors in how long<br>
>> > it will take to do this. Your per-disk free space may also determine what<br>
>> > method you choose.<br>
>> ><br>
>> > I would not expect any data loss while doing this, but you will probably<br>
>> > have availability issues, depending on the data access patterns.<br>
>> ><br>
>> > I'd like to eventually see something in swift that allows for changing<br>
>> > the partition power in existing rings, but that will be<br>
>> > hard/tricky/non-trivial.<br>
>> ><br>
>> > Good luck.<br>
>> ><br>
>> > --John<br>
>> ><br>
>> ><br>
>> > On Jan 11, 2013, at 1:17 PM, Alejandro Comisario<br>
>> > <<a href="mailto:alejandro.comisario@mercadolibre.com">alejandro.comisario@mercadolibre.com</a>> wrote:<br>
>> ><br>
>> >> Hi guys.<br>
>> >> We've created a swift cluster several months ago, the things is that<br>
>> >> righ now we cant add hardware and we configured lots of partitions thinking<br>
>> >> about the final picture of the cluster.<br>
>> >><br>
>> >> Today each datanodes is having 2500+ partitions per device, and even<br>
>> >> tuning the background processes ( replicator, auditor & updater ) we really<br>
>> >> want to try to lower the partition power.<br>
>> >><br>
>> >> Since its not possible to do that without recreating the ring, we can<br>
>> >> have the luxury of recreate it with a very lower partition power, and<br>
>> >> rebalance / deploy the new ring.<br>
>> >><br>
>> >> The question is, having a working cluster with *existing data* is it<br>
>> >> possible to do this and wait for the data to move around *without data loss*<br>
>> >> ???<br>
>> >> If so, it might be true to wait for an improvement in the overall<br>
>> >> cluster performance ?<br>
>> >><br>
>> >> We have no problem to have a non working cluster (while moving the<br>
>> >> data) even for an entire weekend.<br>
>> >><br>
>> >> Cheers.<br>
>> >><br>
>> >><br>
>> ><br>
>> ><br>
>> > _______________________________________________<br>
>> > Mailing list: <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>
>> > Post to : <a href="mailto:openstack@lists.launchpad.net">openstack@lists.launchpad.net</a><br>
>> > Unsubscribe : <a href="https://launchpad.net/~openstack" target="_blank">https://launchpad.net/~openstack</a><br>
>> > More help : <a href="https://help.launchpad.net/ListHelp" target="_blank">https://help.launchpad.net/ListHelp</a><br>
>> ><br>
</div></div></blockquote></div><br></div>