[Openstack-operators] Hyper-converged OpenStack with Ceph

Warren Wang warren at wangspeed.com
Thu Mar 19 19:07:47 UTC 2015


I would avoid co-locating Ceph and compute processes. Memory on compute
nodes is a scare resource, if you're not running with any overcommit, which
you shouldn't. Ceph requires a fair amount (2GB per OSD to be safe) of
guaranteed memory to deal with recovery. You can certainly overload memory
and reserve it, but it is just going to make things difficult to manage and
troubleshoot. I'll give an example. I have 2 Ceph clusters that were
experiencing aggressive page scanning and page cache reclaimation under
some moderate workload. Enough to drive the load on an OSD server to 4
digits. If that had occurred on a box also running compute resources, we
would have had tickets rolling in. However, all we did is slow down some of
the storage, so it largely went unnoticed.

There may also come a time when package dependencies cause conflicts that
will be difficult to reconcile. OVS, kernel, Ceph, etc. It's possible to
attempt to dedicate resources on a single host to various processes, but I
personally don't think it's worth the effort.

Warren

Warren

On Thu, Mar 19, 2015 at 12:33 PM, Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:

> We've running it both ways. We have clouds with dedicated storage nodes,
> and clouds sharing storage/compute.
>
> The storage/compute solution with ceph is working ok for us. But, that
> particular cloud is 1gigabit only and seems very slow compared to our other
> clouds. But because of the gigabit interconnect, while the others are
> 40gigabit, its not clear if its slow because of the storage/compute
> together, or simply because of the slower interconnect. Could be some of
> both.
>
> I'd be very curious if anyone else had a feeling for storage/compute
> together on a faster interconnect.
>
> Thanks,
> Kevin
>
> ________________________________________
> From: Jesse Keating [jlk at bluebox.net]
> Sent: Thursday, March 19, 2015 9:20 AM
> To: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] Hyper-converged OpenStack with Ceph
>
> On 3/19/15 9:08 AM, Jared Cook wrote:
> > Hi, I'm starting to see a number of vendors push hyper-converged
> > OpenStack solutions where compute and Ceph OSD nodes are one in the
> > same.  In addition, Ceph monitors are placed on OpenStack controller
> > nodes in these architectures.
> >
> > Recommendations I have read in the past have been to keep these things
> > separate, but some vendors are now saying that this actually works out
> > OK in practice.
> >
> > The biggest concern I have is that the compute node functions will
> > compete with Ceph functions, and one over utilized node will slow down
> > the entire Ceph cluster, which will slow down the entire cloud.  Is this
> > an unfounded concern?
> >
> > Does anyone have experience running in this mode?  Experience at scale?
> >
> >
>
> Not CEPH related, but it's a known tradeoff that compute resource on
> control nodes can cause resource competition. This is a tradeoff for the
> total cost of the cluster and the expected use case. If the use case
> plans to scale out to many compute nodes, we suggest upgrading to
> dedicated control nodes. This is higher cost, but somewhat necessary for
> matching performance to capacity.
>
> We may start small, but we can scale up to match the (growing) needs.
>
>
> --
> -jlk
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150319/b2fae46e/attachment.html>


More information about the OpenStack-operators mailing list