Open Stack

Thu Sep 1 16:58:34 UTC 2016

I proposed a talk for the Summit which unfortunately did not make it.

We overprovision compute nodes with memory enough for Ceph to run and
we isolate an N number of cores dedicated for the OSD processes.  That
way, there is no competition between VMs and OSDs.  We run all SSD and
it has been quite successful for us.

On Thu, Sep 1, 2016 at 12:37 AM, Blair Bethwaite
<blair.bethwaite at gmail.com> wrote:
> Following on from Edmund's issues... People talking about doing this
> typically seem to cite cgroups as the way to avoid CPU and memory
> related contention - has anyone been successful in e.g. setting up
> cgroups on a nova qemu+kvm hypervisor to limit how much of the machine
> nova uses?
>
> On 1 September 2016 at 04:15, Edmund Rhudy (BLOOMBERG/ 120 PARK)
> <erhudy at bloomberg.net> wrote:
>> We currently run converged at Bloomberg with Ceph (all SSD) and I strongly
>> dislike it. OSDs and VMs battle for CPU time and memory, VMs steal memory
>> that would go to the HV pagecache, and it puts a real dent in any plans to
>> be able to deploy hypervisors (mostly) statelessly. Ceph on our largest
>> compute cluster spews an endless litany of deep-scrub-related HEALTH_WARNs
>> because of memory steal from the VMs depleting available pagecache memory.
>> We're going to increase the OS memory reservation in nova.conf to try to
>> alleviate some of the worst of the memory steal, but it's been one hack
>> after another to keep it going. I hope to be able to re-architect our design
>> at some point to de-converge Ceph from the compute nodes so that the two
>> sides can evolve separately once more.
>>
>> From: matt.jarvis at datacentred.co.uk
>> Subject: Re:[Openstack-operators] Converged infrastructure
>>
>> Time once again to dredge this topic up and see what the wider operators
>> community thinks this time :) There were a fair amount of summit submissions
>> for Barcelona talking about converged and hyper-converged infrastructure, it
>> seems to be the topic de jour from vendors at the minute despite feeling
>> like we've been round this before with Nebula, Piston Cloud etc.
>>
>> Like a lot of others we run Ceph, and we absolutely don't converge our
>> storage and compute nodes for a variety of performance and management
>> related reasons. In our experience, the hardware and tuning characteristics
>> of both types of nodes are pretty different, in any kind of recovery
>> scenarios Ceph eats memory, and it feels like creating a SPOF.
>>
>> Having said that, with pure SSD clusters becoming more common, some of those
>> issues may well be mitigated, so is anyone doing this in production now ? If
>> so, what does your hardware platform look like, and are there issues with
>> these kinds of architectures ?
>>
>> Matt
>>
>> DataCentred Limited registered in England and Wales no. 05611763
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>
>
>
> --
> Cheers,
> ~Blairo
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

-- 
Mohammed Naser — vexxhost
-----------------------------------------------------
D. 514-316-8872
D. 800-910-1726 ext. 200
E. mnaser at vexxhost.com
W. http://vexxhost.com

Open Stack

[Openstack-operators] Converged infrastructure

OpenStack

Community

Documentation

Branding & Legal