[Openstack-operators] Converged infrastructure

Blair Bethwaite blair.bethwaite at gmail.com
Thu Sep 1 04:37:00 UTC 2016


Following on from Edmund's issues... People talking about doing this
typically seem to cite cgroups as the way to avoid CPU and memory
related contention - has anyone been successful in e.g. setting up
cgroups on a nova qemu+kvm hypervisor to limit how much of the machine
nova uses?

On 1 September 2016 at 04:15, Edmund Rhudy (BLOOMBERG/ 120 PARK)
<erhudy at bloomberg.net> wrote:
> We currently run converged at Bloomberg with Ceph (all SSD) and I strongly
> dislike it. OSDs and VMs battle for CPU time and memory, VMs steal memory
> that would go to the HV pagecache, and it puts a real dent in any plans to
> be able to deploy hypervisors (mostly) statelessly. Ceph on our largest
> compute cluster spews an endless litany of deep-scrub-related HEALTH_WARNs
> because of memory steal from the VMs depleting available pagecache memory.
> We're going to increase the OS memory reservation in nova.conf to try to
> alleviate some of the worst of the memory steal, but it's been one hack
> after another to keep it going. I hope to be able to re-architect our design
> at some point to de-converge Ceph from the compute nodes so that the two
> sides can evolve separately once more.
>
> From: matt.jarvis at datacentred.co.uk
> Subject: Re:[Openstack-operators] Converged infrastructure
>
> Time once again to dredge this topic up and see what the wider operators
> community thinks this time :) There were a fair amount of summit submissions
> for Barcelona talking about converged and hyper-converged infrastructure, it
> seems to be the topic de jour from vendors at the minute despite feeling
> like we've been round this before with Nebula, Piston Cloud etc.
>
> Like a lot of others we run Ceph, and we absolutely don't converge our
> storage and compute nodes for a variety of performance and management
> related reasons. In our experience, the hardware and tuning characteristics
> of both types of nodes are pretty different, in any kind of recovery
> scenarios Ceph eats memory, and it feels like creating a SPOF.
>
> Having said that, with pure SSD clusters becoming more common, some of those
> issues may well be mitigated, so is anyone doing this in production now ? If
> so, what does your hardware platform look like, and are there issues with
> these kinds of architectures ?
>
> Matt
>
> DataCentred Limited registered in England and Wales no. 05611763
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



-- 
Cheers,
~Blairo



More information about the OpenStack-operators mailing list