[nova] NUMA scheduling

Sean Mooney smooney at redhat.com
Mon Oct 19 12:02:20 UTC 2020


sorry to top post but i was off on friday.
the issue is that hw:mem_page_size has not been set.

if you are using any numa feature you always need to set the mem_page_size to a value
it does not matter what valid value you set it too but you need to define it in the flavor
or image.
if you do not then you have not activated per numa node memory tracking in nova and your
vms will eventualy be killed by the OOM reaper.


the minium valid numa aware vm to create is hw:mem_page_size=any

that implictly expands to hw:mem_page_size=any hw:numa_nodes=1

since 1 numa node is the default if you do not set hw:numa_nodes

today when we generate a numa toplogy for hw:cpu_policy=dedicated we implictly set hw:numa_node=1 effectivly internally
but we do not defime hw:mem_page_size=small/any/large 

so if you simply defien a flaovr with hw:numa_nodes=1 or hw:cpu_policy=dedicated and no other extra specs then technically
that is an invalid flavor for the libvirt driver.

hw:numa_nodes=1 is vaild for the hyperv driver on its own but not for the libvirt driver.

if you are using any numa featuer with the libvirt driver hw:mem_page_size in the falvor or hw_mem_page_size in the image
must be set for nova to correctly track and allocate memory for the vm.

Sat, 2020-10-17 at 13:44 -0400, Satish Patel wrote:
> or "hw:numa_nodes=2" to see if vm vcpu spreads to both zones.
> 
> On Sat, Oct 17, 2020 at 1:41 PM Satish Patel <satish.txt at gmail.com> wrote:
> > 
> > I would say try without  "hw:numa_nodes=1" in flavor properties.
> > 
> > ~S
> > 
> > On Sat, Oct 17, 2020 at 1:28 PM Eric K. Miller
> > <emiller at genesishosting.com> wrote:
> > > 
> > > > What is the error thrown by Openstack when NUMA0 is full?
> > > 
> > > 
> > > 
> > > OOM is actually killing the QEMU process, which causes Nova to report:
> > > 
> > > 
> > > 
> > > /var/log/kolla/nova/nova-compute.log.4:2020-08-25 12:31:19.812 6 WARNING nova.compute.manager [req-62bddc53-ca8b-4bdc-bf41-8690fc88076f - - - -
> > > -] [instance: 8d8a262a-6e60-4e8a-97f9-14462f09b9e5] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current
> > > task_state: None, original DB power_state: 1, current VM power_state: 4
> > > 
> > > 
> > > 
> > > So, there isn't a NUMA or memory-specific error from Nova - Nova is simply scheduling a VM on a node that it thinks has enough memory, and
> > > Libvirt (or Nova?) is configuring the VM to use CPU cores on a full NUMA node.
> > > 
> > > 
> > > 
> > > NUMA Node 1 had about 240GiB of free memory with about 100GiB of buffer/cache space used, so plenty of free memory, whereas NUMA Node 0 was
> > > pretty tight on free memory.
> > > 
> > > 
> > > 
> > > These are some logs in /var/log/messages (not for the nova-compute.log entry above, but the same condition for a VM that was killed - logs were
> > > rolled, so I had to pick a different VM):
> > > 
> > > 
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: CPU 0/KVM invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0,
> > > oom_score_adj=0
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: CPU: 15 PID: 30468 Comm: CPU 0/KVM Not tainted 5.3.8-1.el7.elrepo.x86_64 #1
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Hardware name: <redacted hardware>
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Call Trace:
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: dump_stack+0x63/0x88
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: dump_header+0x51/0x210
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: oom_kill_process+0x105/0x130
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: out_of_memory+0x105/0x4c0
> > > 
> > > …
> > > 
> > > …
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: active_anon:108933472 inactive_anon:174036 isolated_anon:0#012 active_file:21875969
> > > inactive_file:2418794 isolated_file:32#012 unevictable:88113 dirty:0 writeback:4 unstable:0#012 slab_reclaimable:3056118
> > > slab_unreclaimable:432301#012 mapped:71768 shmem:570159 pagetables:258264 bounce:0#012 free:58924792 free_pcp:326 free_cma:0
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 active_anon:382548916kB inactive_anon:173052kB active_file:0kB inactive_file:2272kB
> > > unevictable:289840kB isolated(anon):0kB isolated(file):128kB mapped:16696kB dirty:0kB writeback:0kB shmem:578812kB shmem_thp: 0kB
> > > shmem_pmdmapped: 0kB anon_thp: 286420992kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 DMA free:15880kB min:0kB low:12kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB
> > > inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15880kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB
> > > free_pcp:0kB local_pcp:0kB free_cma:0kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: lowmem_reserve[]: 0 1589 385604 385604 385604
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 DMA32 free:1535904kB min:180kB low:1780kB high:3380kB active_anon:90448kB inactive_anon:0kB
> > > active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1717888kB managed:1627512kB mlocked:0kB kernel_stack:0kB
> > > pagetables:0kB bounce:0kB free_pcp:1008kB local_pcp:248kB free_cma:0kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: lowmem_reserve[]: 0 0 384015 384015 384015
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 Normal free:720756kB min:818928kB low:1212156kB high:1605384kB active_anon:382458300kB
> > > inactive_anon:173052kB active_file:0kB inactive_file:2272kB unevictable:289840kB writepending:0kB present:399507456kB managed:393231952kB
> > > mlocked:289840kB kernel_stack:58344kB pagetables:889796kB bounce:0kB free_pcp:296kB local_pcp:0kB free_cma:0kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: lowmem_reserve[]: 0 0 0 0 0
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
> > > 1*2048kB (M) 3*4096kB (M) = 15880kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 DMA32: 1*4kB (U) 1*8kB (M) 0*16kB 9*32kB (UM) 11*64kB (UM) 12*128kB (UM) 12*256kB (UM)
> > > 11*512kB (UM) 11*1024kB (M) 1*2048kB (U) 369*4096kB (M) = 1535980kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 Normal: 76633*4kB (UME) 30442*8kB (UME) 7998*16kB (UME) 1401*32kB (UE) 6*64kB (U) 0*128kB
> > > 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 723252kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: 24866489 total pagecache pages
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: 0 pages in swap cache
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Swap cache stats: add 0, delete 0, find 0/0
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Free swap  = 0kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Total swap = 0kB
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: 200973631 pages RAM
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: 0 pages HighMem/MovableOnly
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: 3165617 pages reserved
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: 0 pages hwpoisoned
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Tasks state (memory values in pages):
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: [   2414]     0  2414    33478    20111   315392        0             0 systemd-journal
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: [   2438]     0  2438    31851      540   143360        0             0 lvmetad
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: [   2453]     0  2453    12284     1141   131072        0         -1000 systemd-udevd
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: [   4170]     0  4170    13885      446   131072        0         -1000 auditd
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: [   4393]     0  4393     5484      526    86016        0             0 irqbalance
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: [   4394]     0  4394     6623      624   102400        0             0 systemd-logind
> > > 
> > > …
> > > 
> > > …
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: oom-
> > > kill:constraint=CONSTRAINT_MEMORY_POLICY,nodemask=0,cpuset=vcpu0,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-
> > > qemu\x2d237\x2dinstance\x2d0000fda8.scope,task=qemu-kvm,pid=25496,uid=42436
> > > 
> > > Oct 10 15:17:01 <redacted hostname> kernel: Out of memory: Killed process 25496 (qemu-kvm) total-vm:67989512kB, anon-rss:66780940kB, file-
> > > rss:11052kB, shmem-rss:4kB
> > > 
> > > Oct 10 15:17:02 <redacted hostname> kernel: oom_reaper: reaped process 25496 (qemu-kvm), now anon-rss:0kB, file-rss:36kB, shmem-rss:4kB
> 





More information about the openstack-discuss mailing list