[Openstack-operators] Ceph crashes with larger clusters and denser hardware

Warren Wang warren at wangspeed.com
Thu Aug 28 16:38:37 UTC 2014


One of my colleagues here at Comcast just returned from the Operators
Summit and mentioned that multiple folks experienced Ceph instability with
larger clusters. I wanted to send out a note and save headache for some
folks.

If you up the number of threads per OSD, there are situations where many
threads could be quickly spawned. You must up the max number of PIDs
available to the OS, otherwise you essentially get fork bombed. Every
single Ceph process with crash, and you might see a message in your shell
about "Cannot allocate memory".

In your sysctl.conf:

# For Ceph
kernel.pid_max=4194303

Then run "sysctl -p". In 5 days on a lab Ceph box, we have mowed through
nearly 2 million PIDs. There's a tracker about this to add it to the
ceph.com docs.

Warren
@comcastwarren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140828/eb1bc119/attachment.html>


More information about the OpenStack-operators mailing list