<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>
</p>
<div class="moz-text-plain" wrap="true" style="font-family:
-moz-fixed; font-size: 12px;" lang="x-unicode">
<pre wrap="">Hello everyone,
We're experiencing issues with running large instances (~60GB RAM) on
fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
problem is that it seems that in some extreme cases qemu/KVM can have
significant memory overhead (10-15%?) which nova-compute service doesn't
take in to the account when launching VMs. Using our configuration as an
example - imagine running two VMs with 30GB RAM on one NUMA node
(because we use cpu pinning) - therefore using 60GB out of 64GB for
given NUMA domain. When both VMs would consume their entire memory
(given 10% KVM overhead) OOM killer takes an action (despite having
plenty of free RAM in other NUMA nodes). (the numbers are just
arbitrary, the point is that nova-scheduler schedules the instance to
run on the node because the memory seems 'free enough', but specific
NUMA node can be lacking the memory reserve).
Our initial solution was to use ram_allocation_ratio < 1 to ensure
having some reserved memory - this didn't work. Upon studying source of
nova, it turns out that ram_allocation_ratio is ignored when using cpu
pinning. (see
<a class="moz-txt-link-freetext" href="https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859">https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859</a>
and
<a class="moz-txt-link-freetext" href="https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821">https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821</a>
). We're running Mitaka, but this piece of code is implemented in Ocata
in a same way.
We're considering to create a patch for taking ram_allocation_ratio in
to account.
My question is - is ram_allocation_ratio ignored on purpose when using
cpu pinning? If yes, what is the reasoning behind it? And what would be
the right solution to ensure having reserved RAM on the NUMA nodes?
Thanks.
Regards,
Jakub Jursa
</pre>
</div>
</body>
</html>