[Openstack-operators] Architecture Opinions

Erik McCormick emccormick at cirrusseven.com
Wed Oct 1 14:09:13 UTC 2014


On Wed, Oct 1, 2014 at 4:08 AM, Jesse Pretorius <jesse.pretorius at gmail.com>
wrote:

> I'd like to clarify a few things, specifically related to Ceph usage, in
> less of a rushed response. :)
>
> Note - my production experience has only been with Ceph Dumpling. Plenty
> of great patches which resolve many of the issues I've experienced have
> landed, so YMMV.
>
> On 30 September 2014 15:06, Jesse Pretorius <jesse.pretorius at gmail.com>
> wrote:
>
>> I would recommend ensuring that:
>>
>> 1) ceph-mon's and ceph-osd's are not hosted on the same server - they
>> both demand plenty of cpu cycles
>>
>
> The ceph-mon will generally not use much CPU. If a whole chassis is lost,
> you'll see it spike heavily, but it'll drop off again after the rebuild is
> complete. I would still recommend keeping at least one ceph-mon on a host
> that isn't hosting OSD's. The mons are where all clients get the data
> location details from, so at least one really needs to be available no
> matter what happens.
>
> At the beginning when things are small (few OSD) I'm intending to run mons
on the osd nodes. When I start to grow it, my plan is to start deploying
separate monitors and eventually disable the mons on the OSD nodes
entirely.


> And, FYI, I would definitely recommend implementing separate networks for
> client access and the storage back-end. This can allow you to ensure that
> your storage replication traffic is separated and you can tune the QoS for
> each differently.
>

I've got a dedicated, isolated 10 GB network between the Ceph nodes
dedicated purely to replication traffic. Another interface (also 10 GB)
will handle traffic from Openstack, and a 3rd (1 GB) will deal with RadosGW
traffic from the public side.

>
>
>> 5) instance storage on ceph doesn't work very well if you're trying to
>> use the kernel module or cephfs - make sure you're using ceph volumes as
>> the underlying storage (I believe this has been patched in for Juno)
>>
>
> cephfs, certainly in Dumpling, is not production ready - our experiment
> with using it in production was quickly rolled back when one of the client
> servers lost connection to the ceph-mds for some reason and the storage on
> it became inaccessible. The client connection to the mds in Dumpling isn't
> as resilient as the client connection for the block device.
>
> By 'use the kernel module' I mean create an image and mounting it to the
> server through the ceph block device kernel module, then building a file
> system on it and using it like you would any network-based storage.
> We found that when using one image as shared storage between servers,
> updates from one server wasn't always visible quickly enough (within a
> minute) on the other server. If you choose to use a single image per
> server, then only mount server2's image on server1 in a disaster recovery
> situation then it should be just fine.
> We did find that mounting a file system using the kernel module would tend
> to cause a kernel panic when trying to disconnect the storage. Note that
> there have been several improvements in the revisions after Dumpling,
> including some bug fixes for issues that look similar to what we
> experienced.
>
> By "make sure you're using ceph volumes as the underlying storage" I meant
> that each instance root disk should be stored as its own Ceph Image in a
> storage pool. This can be facilitated directly from nova by using
> 'images_type=rbd' in nova.conf which became available in OpenStack Havana.
> Support for using RBD for Ephemeral disks as well finally landed in Juno
> (see https://bugs.launchpad.net/nova/+bug/1226351), as did support for
> copy-on-write cloning (see
> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler)
> which rounds out the feature set for using an RBD back-end quite nicely. :)
>
> I was originally planning on doing what you say about using
images_type=rbd with my main wish being to have the ability to live-migrate
images off a compute node. I discovered yesterday that block migration
works just fine with kvm/libvirt now despite assertions in the Openstack
documentation. I can live with that for now. The last time I tried the RBD
backend was in Havana and it had some goofy behavior, so I think I'll let
this idea sit for a while and maybe try again in Kilo once the new
copy-on-write code has had a chance to age a bit ;).

> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20141001/ebbeec0a/attachment.html>


More information about the OpenStack-operators mailing list