[Openstack-operators] Ceph crashes with larger clusters and denser hardware

Fischer, Matt matthew.fischer at twcable.com
Thu Aug 28 20:51:18 UTC 2014


What version of ceph was this seen on?

From: Warren Wang <warren at wangspeed.com<mailto:warren at wangspeed.com>>
Date: Thursday, August 28, 2014 10:38 AM
To: "openstack-operators at lists.openstack.org<mailto:openstack-operators at lists.openstack.org>" <openstack-operators at lists.openstack.org<mailto:openstack-operators at lists.openstack.org>>
Subject: [Openstack-operators] Ceph crashes with larger clusters and denser hardware

One of my colleagues here at Comcast just returned from the Operators Summit and mentioned that multiple folks experienced Ceph instability with larger clusters. I wanted to send out a note and save headache for some folks.

If you up the number of threads per OSD, there are situations where many threads could be quickly spawned. You must up the max number of PIDs available to the OS, otherwise you essentially get fork bombed. Every single Ceph process with crash, and you might see a message in your shell about "Cannot allocate memory".

In your sysctl.conf:

# For Ceph
kernel.pid_max=4194303

Then run "sysctl -p". In 5 days on a lab Ceph box, we have mowed through nearly 2 million PIDs. There's a tracker about this to add it to the ceph.com<http://ceph.com> docs.

Warren
@comcastwarren

________________________________
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140828/79dfdfb0/attachment.html>


More information about the OpenStack-operators mailing list