[Openstack-operators] Architecture Opinions

Jesse Pretorius jesse.pretorius at gmail.com
Wed Oct 1 08:08:15 UTC 2014


I'd like to clarify a few things, specifically related to Ceph usage, in
less of a rushed response. :)

Note - my production experience has only been with Ceph Dumpling. Plenty of
great patches which resolve many of the issues I've experienced have
landed, so YMMV.

On 30 September 2014 15:06, Jesse Pretorius <jesse.pretorius at gmail.com>
wrote:

> I would recommend ensuring that:
>
> 1) ceph-mon's and ceph-osd's are not hosted on the same server - they both
> demand plenty of cpu cycles
>

The ceph-mon will generally not use much CPU. If a whole chassis is lost,
you'll see it spike heavily, but it'll drop off again after the rebuild is
complete. I would still recommend keeping at least one ceph-mon on a host
that isn't hosting OSD's. The mons are where all clients get the data
location details from, so at least one really needs to be available no
matter what happens.

And, FYI, I would definitely recommend implementing separate networks for
client access and the storage back-end. This can allow you to ensure that
your storage replication traffic is separated and you can tune the QoS for
each differently.


> 5) instance storage on ceph doesn't work very well if you're trying to use
> the kernel module or cephfs - make sure you're using ceph volumes as the
> underlying storage (I believe this has been patched in for Juno)
>

cephfs, certainly in Dumpling, is not production ready - our experiment
with using it in production was quickly rolled back when one of the client
servers lost connection to the ceph-mds for some reason and the storage on
it became inaccessible. The client connection to the mds in Dumpling isn't
as resilient as the client connection for the block device.

By 'use the kernel module' I mean create an image and mounting it to the
server through the ceph block device kernel module, then building a file
system on it and using it like you would any network-based storage.
We found that when using one image as shared storage between servers,
updates from one server wasn't always visible quickly enough (within a
minute) on the other server. If you choose to use a single image per
server, then only mount server2's image on server1 in a disaster recovery
situation then it should be just fine.
We did find that mounting a file system using the kernel module would tend
to cause a kernel panic when trying to disconnect the storage. Note that
there have been several improvements in the revisions after Dumpling,
including some bug fixes for issues that look similar to what we
experienced.

By "make sure you're using ceph volumes as the underlying storage" I meant
that each instance root disk should be stored as its own Ceph Image in a
storage pool. This can be facilitated directly from nova by using
'images_type=rbd' in nova.conf which became available in OpenStack Havana.
Support for using RBD for Ephemeral disks as well finally landed in Juno
(see https://bugs.launchpad.net/nova/+bug/1226351), as did support for
copy-on-write cloning (see
https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler) which
rounds out the feature set for using an RBD back-end quite nicely. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20141001/4999704d/attachment.html>


More information about the OpenStack-operators mailing list