<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.E-MailFormatvorlage17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="DE" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">after adding additional disks and storing the account- and container-server on SSDs the performance is much better:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Before:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">GETs average 620 ms<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">PUTs average 1900 ms<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">After:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">GETs average 280 ms<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">PUTs average 1100 ms<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Only the rebalance process took days to sync all the data to the additional five disks (before each storage node had 3 disks). I used a concurrency
of 4. One round to replicate all partitions took over 24 hours. After five days the replicate process takes only 300 seconds.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Each additional disk has now 300 GB data stored. Is such duration normal to sync the data?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Klaus<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Von:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Maximiliano Venesio [mailto:maximiliano.venesio@mercadolibre.com]
<br>
</span><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Gesendet:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Donnerstag, 8. August 2013 17:26<br>
<b>An:</b> Robert van Leeuwen<br>
<b>Cc:</b> openstack@lists.openstack.org<br>
<b>Betreff:</b> Re: [Openstack] [SWIFT] PUTs and GETs getting slower<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Hi Robert, <o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<p class="MsoNormal">I was reading your post and is interesting because we have similar swift deployments and uses cases. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">We are storing millons of small images in our swift cluster, 32 Storage nodes w/12 - 2TB HDD + 2 SSD each one, and we are having an total average of 200k rpm in whole cluster.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">In terms of % of util. of our disks, we have an average of 50% of util in all our disks but we just are using a 15% of the total capacity of them.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">When I look at used inodes on our object nodes with "df -i" we hit about 17 million inodes per disk.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">So it seems a big number of inodes considering that we are using just a 15% of the total capacity. A different thing here is that we are using 512K of inode size and we have a big amount of memory . <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Also we always have one of our disks close to 100% of util, and this is caused by the object-auditor that scans all our disks continuously. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">So we was also thinking in the possibility to change the kind of disks that we are using, to use smaller and faster disks.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Will be really util to know what kind of disks are you using in your old and new storage nodes, and compare that with our case.<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Cheers,</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Arial","sans-serif"">Max</span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><br clear="all">
<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><b><span style="font-family:"Arial","sans-serif";color:#888888;background:white"><img width="96" height="58" id="_x0000_i1025" src="http://s14.postimage.org/sg1lztqep/cloudbuilders_Logo_last_small.png"></span><span style="background:white"><o:p></o:p></span></b></p>
</div>
<div>
<p class="MsoNormal"><b><span style="background:white"><o:p> </o:p></span></b></p>
</div>
<p class="MsoNormal"><b><span style="font-family:"Arial","sans-serif";color:#333333;background:white">Maximiliano Venesio</span></b><b><span style="font-family:"Arial","sans-serif";color:#888888;background:white"> </span></b><span style="background:white"><br>
</span><b><span style="font-family:"Arial","sans-serif";color:#888888;background:white">#melicloud CloudBuilders</span></b><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#888888"><br>
</span><span lang="ES" style="font-size:6.0pt;font-family:"Arial","sans-serif";color:gray;background:white">Arias 3751, Piso 7 (C1430CRG) <br>
Ciudad de Buenos Aires - Argentina<br>
Cel: +549(11) 15-3770-1853<br>
Tel : +54(11) 4640-8411</span><o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Tue, Aug 6, 2013 at 11:54 AM, Robert van Leeuwen <<a href="mailto:Robert.vanLeeuwen@spilgames.com" target="_blank">Robert.vanLeeuwen@spilgames.com</a>> wrote:<o:p></o:p></p>
<p class="MsoNormal">Could you check your disk IO on the container /object nodes?<br>
<br>
We have quite a lot of files in swift and for comparison purposes I played a bit with COSbench to see where we hit the limits.<br>
We currently max out at about 200 - 300 put request/second and the bottleneck is the disk IO on the object nodes<br>
Our account / container nodes are on SSD's and are not a limiting factor.<br>
<br>
You can look for IO bottlenecks with e.g. "iostat -x 10" (this will refresh the view every 10 seconds.)<br>
During the benchmark is see some of the disks are hitting 100% utilization.<br>
That it is hitting the IO limits with just 200 puts a second has to do with the number of files on the disks.<br>
When I look at used inodes on our object nodes with "df -i" we hit about 60 million inodes per disk.<br>
(a significant part of that are actually directories I calculated about 30 million files based on the number of files in swift)<br>
We use flashcache in front of those disks and it is still REALLY slow, just doing a "ls" can take up to 30 seconds.<br>
Probably adding lots of memory should help caching the inodes in memory but that is quite challenging:<br>
I am not sure how big a directory is in the xfs inode tree but just the files:<br>
30 million x 1k inodes = 30GB<br>
And that is just one disk :)<br>
<br>
We still use the old recommended inode size of 1k and the default of 256 can be used now with recent kernels:<br>
<a href="https://lists.launchpad.net/openstack/msg24784.html" target="_blank">https://lists.launchpad.net/openstack/msg24784.html</a><br>
<br>
So sometime ago we decided to go for nodes with more,smaller & faster disks with more memory.<br>
Those machines are not even close to their limits however we still have more "old" nodes<br>
so performance is limited by those machines.<br>
At this moment it is sufficient for our use case but I am pretty confident we would be able to<br>
significantly improve performance by adding more of those machines and doing some re-balancing of the load.<br>
<br>
Cheers,<br>
Robert van Leeuwen<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal">_______________________________________________<br>
Mailing list: <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" target="_blank">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack</a><br>
Post to : <a href="mailto:openstack@lists.openstack.org">openstack@lists.openstack.org</a><br>
Unsubscribe : <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" target="_blank">
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack</a><o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>