[Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

Blair Bethwaite blair.bethwaite at gmail.com
Fri Oct 14 10:08:28 UTC 2016


Hi Adam,

I agree somewhat, capacity management and growth at scale is something
of a pain. Ceph gives you a hugely powerful and flexible way to manage
data-placement through crush but there is very little quality info
about, or examples of, non-naive crushmap configurations.

I think I understand what you are getting at in regards to
failure-domain, e.g., a large cluster of 1000+ drives may require a
single storage pool (e.g., for nova) across most/all of that storage.
The chances of overlapping drive failures (overlapping meaning before
recovery has completed) in multiple nodes is higher the more drives
there are in the pool unless you design your crushmap to limit the
size of any replica-domain (i.e., the leaf crush bucket that a single
copy of an object may end up in). And in the rbd use case, if you are
unlucky and even lose just a tiny fraction of objects, due to random
placement there is a good chance you have lost a handful of objects
from most/all rbd volumes in the cluster, which could make for many
unhappy users with potentially unrecoverable filesystems in those
rbds.

The guys at UnitedStack did a nice presentation that touched on this a
while back (http://www.slideshare.net/kioecn/build-an-highperformance-and-highdurable-block-storage-service-based-on-ceph)
but I'm not sure I follow their durability model just from these
slides, and if you're going to play with this you really do want a
tool to calculate/simulate the impact the changes.

Interesting discussion - maybe loop in ceph-users?

Cheers,

On 14 October 2016 at 19:53, Adam Kijak <adam.kijak at corp.ovh.com> wrote:
>> ________________________________________
>> From: Clint Byrum <clint at fewbar.com>
>> Sent: Wednesday, October 12, 2016 10:46 PM
>> To: openstack-operators
>> Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do     you handle Nova on Ceph?
>>
>> Excerpts from Adam Kijak's message of 2016-10-12 12:23:41 +0000:
>> > > ________________________________________
>> > > From: Xav Paice <xavpaice at gmail.com>
>> > > Sent: Monday, October 10, 2016 8:41 PM
>> > > To: openstack-operators at lists.openstack.org
>> > > Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?
>> > >
>> > > I'm really keen to hear more about those limitations.
>> >
>> > Basically it's all related to the failure domain ("blast radius") and risk management.
>> > Bigger Ceph cluster means more users.
>>
>> Are these risks well documented? Since Ceph is specifically designed
>> _not_ to have the kind of large blast radius that one might see with
>> say, a centralized SAN, I'm curious to hear what events trigger
>> cluster-wide blasts.
>
> In theory yes, Ceph is desgined to be fault tolerant,
> but from our experience it's not always like that.
> I think it's not well documented, but I know this case:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg32804.html
>
>> > Growing the Ceph cluster temporary slows it down, so many users will be affected.
>> One might say that a Ceph cluster that can't be grown without the users
>> noticing is an over-subscribed Ceph cluster. My understanding is that
>> one is always advised to provision a certain amount of cluster capacity
>> for growing and replicating to replaced drives.
>
> I agree that provisioning a fixed size Cluster would solve some problems but planning the capacity is not always easy.
> Predicting the size and making it cost effective (empty big Ceph cluster costs a lot on the beginning) is quite difficult.
> Also adding a new Ceph cluster will be always more transparent to users than manipulating existing one especially when growing pool PGs)
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



-- 
Cheers,
~Blairo



More information about the OpenStack-operators mailing list