[Openstack-operators] KVM memory overcommit with fast swap

George Shuklin george.shuklin at gmail.com
Fri Jul 3 21:22:04 UTC 2015


One notice: Even on the super-super-fast SSD, there is a huge overhead 
on IO. Basically, you can't go lower than 50 us on IO, and this is 50000 
ns, almost eternity for the modern processors. And you get minor page 
fault, which is not the fastest thing in the world. Few context 
switching, filesystem/block device level... And 50us - is the best 
possible. Normally you will have something like 150us, which is very slow.

It's ok to push to swap some unused  or rarely used part of the guests 
memory, but do not expect it to be silver bullet. Borderline between 
'normal swap operations' and 'thrashed system' is very blurry, and main 
symptom your guests will experience during overswapping is extreme raise 
of latency (everything: IO, networking...). And when this happens you 
will have no knobs to fix things... Even if you kill some of the guests, 
it will take up to 10 minutes to finish thrashing part of the swap and 
reduce congestion on IO.

In my experience, for average compute node no more than 20% of memory 
may be pushed to swap without significant consequences.

... And swap in the guests is better. Because guest may throw away few 
pages from cache, if needed. But host will swap guest page cache as 
well, as actual process memory. Allocate that SSD as ephemeral drive to 
guests and let them swap.

On 07/03/2015 11:19 AM, Blair Bethwaite wrote:
> Damnit! So no-one has done this or has a feel for it?
> I was really hoping for the lazy option here.
>
> So next question. Ideas for convoluting a reasonable test case?
> Assuming I've got a compute node with 256GB RAM and 350GB of PCIe SSD
> for swap, what next? We've got Rally going so could potentially use
> that, but I'm not sure whether it can do different tasks in parallel
> in order to simulate a set of varied workloads... Ideally we'd want at
> least these workloads happening in parallel:
> - web servers
> - db servers
> - idle servers
> - batch processing
>
> On 30 June 2015 at 03:24, Warren Wang <warren at wangspeed.com> wrote:
>> I'm gonna forward this to my co-workers :) I've been kicking this idea
>> around for some time now, and it hasn't caught traction. I think it could
>> work for a modest overcommit, depending on the memory workload. We decided
>> that it should be possible to do this sanely, but that it needed testing.
>> I'm happy to help test this out. Sounds like the results could be part of a
>> Tokyo talk :P
>>
>> Warren
>>
>> Warren
>>
>> On Mon, Jun 29, 2015 at 9:36 AM, Blair Bethwaite <blair.bethwaite at gmail.com>
>> wrote:
>>> Hi all,
>>>
>>> Question up-front:
>>>
>>> Do the performance characteristics of modern PCIe attached SSDs
>>> invalidate/challenge the old "don't overcommit memory" with KVM wisdom
>>> (recently discussed on this list and at meetups and summits)? Has
>>> anyone out there tried & tested this?
>>>
>>> Long-form:
>>>
>>> I'm currently looking at possible options for increasing virtual
>>> capacity in a public/community KVM based cloud. We started very
>>> conservatively at a 1:1 cpu allocation ratio, so perhaps predictably
>>> we have boatloads of CPU headroom to work with. We also see maybe 50%
>>> memory actually in-use on a host that is, from Nova's perspective,
>>> more-or-less full.
>>>
>>> The most obvious thing to do here is increase available memory. There
>>> are at least three ways to achieve that:
>>> 1/ physically add RAM
>>> 2/ reduce RAM per vcore (i.e., introduce lower RAM flavors)
>>> 3/ increase virtual memory capacity (i.e., add swap) and make
>>> ram_allocation_ratio > 1
>>>
>>> We're already doing a bit of #2, but at the end of the day, taking
>>> away flavors and trying to change user behaviour is actually harder
>>> than just upgrading hardware. #1 is ideal but I do wonder whether we'd
>>> be better to spend that same money on some PCIe SSD and use it for #3
>>> (at least for our 'standard' flavor classes), the advantage being that
>>> SSD is cheaper per GB (and it might also help alleviate IOPs
>>> starvation for local storage based hosts)...
>>>
>>> The question is whether the performance characteristics of modern PCIe
>>> attached SSDs invalidate the old "don't overcommit memory" with KVM
>>> wisdom (recently discussed on this list:
>>> http://www.gossamer-threads.com/lists/openstack/operators/46104 and
>>> also apparently at the Kilo mid-cycle:
>>> https://etherpad.openstack.org/p/PHL-ops-capacity-mgmt where there was
>>> an action to update the default from 1.5 to 1.0, though that doesn't
>>> seem to have happened). Has anyone out there tried this?
>>>
>>> I'm also curious if anyone has any recent info re. the state of
>>> automated memory ballooning and/or memory hotplug? Ideally a RAM
>>> overcommitted host would try to inflate guest balloons before
>>> swapping.
>>>
>>> --
>>> Cheers,
>>> ~Blairo
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>
>




More information about the OpenStack-operators mailing list