[Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?

Clint Byrum clint at fewbar.com
Wed Oct 12 20:46:01 UTC 2016


Excerpts from Adam Kijak's message of 2016-10-12 12:23:41 +0000:
> > ________________________________________
> > From: Xav Paice <xavpaice at gmail.com>
> > Sent: Monday, October 10, 2016 8:41 PM
> > To: openstack-operators at lists.openstack.org
> > Subject: Re: [Openstack-operators] [openstack-operators][ceph][nova] How do you handle Nova on Ceph?
> > 
> > On Mon, 2016-10-10 at 13:29 +0000, Adam Kijak wrote:
> > > Hello,
> > >
> > > We use a Ceph cluster for Nova (Glance and Cinder as well) and over
> > > time,
> > > more and more data is stored there. We can't keep the cluster so big
> > > because of
> > > Ceph's limitations. Sooner or later it needs to be closed for adding
> > > new
> > > instances, images and volumes. Not to mention it's a big failure
> > > domain.
> > 
> > I'm really keen to hear more about those limitations.
> 
> Basically it's all related to the failure domain ("blast radius") and risk management.
> Bigger Ceph cluster means more users.

Are these risks well documented? Since Ceph is specifically designed
_not_ to have the kind of large blast radius that one might see with
say, a centralized SAN, I'm curious to hear what events trigger
cluster-wide blasts.

> Growing the Ceph cluster temporary slows it down, so many users will be affected.

One might say that a Ceph cluster that can't be grown without the users
noticing is an over-subscribed Ceph cluster. My understanding is that
one is always advised to provision a certain amount of cluster capacity
for growing and replicating to replaced drives.

> There are bugs in Ceph which can cause data corruption. It's rare, but when it happens 
> it can affect many (maybe all) users of the Ceph cluster.
> 

:(



More information about the OpenStack-operators mailing list