<div dir="ltr">One of my colleagues here at Comcast just returned from the Operators Summit and mentioned that multiple folks experienced Ceph instability with larger clusters. I wanted to send out a note and save headache for some folks. <br clear="all">
<div><div><br></div><div><span style="font-family:arial,helvetica,sans-serif">If you up the number of threads per OSD, there are situations where many threads could be quickly spawned. You must up the max number of PIDs available to the OS, otherwise you essentially get fork bombed. Every single Ceph process with crash, and you might see a message in your shell about "Cannot allocate memory"<code>.<br>
</code></span></div><div><span style="font-family:arial,helvetica,sans-serif"><code><br><font>In your sysctl.conf:</font><br><br># For Ceph<br>kernel.pid_max=4194303<br><br></code></span></div><span style="font-family:arial,helvetica,sans-serif">Then run "sysctl -p". In 5 days on a lab Ceph box, we have mowed through nearly 2 million PIDs. There's a tracker about this to add it to the <a href="http://ceph.com">ceph.com</a> docs.<br>
</span></div><div><div><span style="font-family:arial,helvetica,sans-serif"><br>Warren<br></span></div><div><span style="font-family:arial,helvetica,sans-serif">@comcastwarren<br></span></div>
</div></div>