[openstack-dev] [nova] Running large instances with CPU pinning and OOM

Chris Friesen chris.friesen at windriver.com
Wed Sep 27 15:03:44 UTC 2017

On 09/27/2017 08:01 AM, Blair Bethwaite wrote:
> On 27 September 2017 at 23:19, Jakub Jursa <jakub.jursa at chillisys.com> wrote:
>> 'hw:cpu_policy=dedicated' (while NOT setting 'hw:numa_nodes') results in
>> libvirt pinning CPU in 'strict' memory mode
>> (from libvirt xml for given instance)
>> ...
>>    <numatune>
>>      <memory mode='strict' nodeset='1'/>
>>      <memnode cellid='0' mode='strict' nodeset='1'/>
>>    </numatune>
>> ...
>> So yeah, the instance is not able to allocate memory from another NUMA node.
> I can't recall what the docs say on this but I wouldn't be surprised
> if that was a bug. Though I do think most users would want CPU & NUMA
> pinning together (you haven't shared your use case but perhaps you do
> too?).

Not a bug.  Once you enable CPU pinning we assume you care about performance, 
and for max performance you need NUMA affinity as well.  (And hugepages are 
beneficial too.)

>> I'm not quite sure what do you mean by 'memory will be locked for the
>> guest'. Also, aren't huge pages enabled in kernel by default?
> I think that suggestion was probably referring to static hugepages,
> which can be reserved (per NUMA node) at boot and then (assuming your
> host is configured correctly) QEMU will be able to back guest RAM with
> them.

One nice thing about static hugepages is that you pre-allocate them at startup, 
so you can decide on a per-NUMA-node basis how much 4K memory you want to leave 
for incidental host stuff and qemu overhead.  This lets you specify different 
amounts of "host-reserved" memory on different NUMA nodes.

In order to use static hugepages for the guest you need to explicitly ask for a 
page size of 2MB.  (1GB is possible as well but in most cases doesn't buy you 
much compared to 2MB.)

Lastly, qemu has overhead that varies depending on what you're doing in the 
guest.  In particular, there are various IO queues that can consume significant 
amounts of memory.  The company that I work for put in a good bit of effort 
engineering things so that they work more reliably, and part of that was 
determining how much memory to reserve for the host.


More information about the OpenStack-dev mailing list