[nova] OOM Killed Processes
Hi,, I am having trouble with my compute node that nova-compute and ovs process are being killed by OOM. I have alot memory available in system. # free -g total used free shared buff/cache available Mem: 1006 121 881 0 2 879 Swap: 7 0 7 But I am seeing process are being killed in dmesg. [Sat Feb 19 03:46:26 2022] Memory cgroup out of memory: Killed process 2080898 (ovs-vswitchd) total-vm:9474284kB, anon-rss:1076384kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2776kB oom_score_adj:0 [Sat Feb 19 03:47:01 2022] Memory cgroup out of memory: Killed process 2081218 (ovs-vswitchd) total-vm:9475332kB, anon-rss:1096988kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2780kB oom_score_adj:0 [Sat Feb 19 03:47:06 2022] Memory cgroup out of memory: Killed process 2081616 (ovs-vswitchd) total-vm:9473252kB, anon-rss:1073052kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2784kB oom_score_adj:0 [Sat Feb 19 03:47:16 2022] Memory cgroup out of memory: Killed process 2081940 (ovs-vswitchd) total-vm:9471236kB, anon-rss:1070920kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2776kB oom_score_adj:0 [Sat Feb 19 03:47:16 2022] Memory cgroup out of memory: Killed process 6098 (nova-compute) total-vm:3428356kB, anon-rss:279920kB, file-rss:9868kB, shmem-rss:0kB, UID:64060 pgtables:1020kB oom_score_adj:0 [Mon Feb 21 11:15:08 2022] Memory cgroup out of memory: Killed process 2082296 (ovs-vswitchd) total-vm:9475372kB, anon-rss:1162636kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2864kB oom_score_adj:0 Any advice on how to fix this ? Also any best practices document on configuring memory optimizations in nova compute node. Ammad
On Mon, 2022-02-21 at 12:24 +0500, Ammad Syed wrote:
Hi,,
I am having trouble with my compute node that nova-compute and ovs process are being killed by OOM. I have alot memory available in system.
# free -g total used free shared buff/cache available Mem: 1006 121 881 0 2 879 Swap: 7 0 7
But I am seeing process are being killed in dmesg.
[Sat Feb 19 03:46:26 2022] Memory cgroup out of memory: Killed process 2080898 (ovs-vswitchd) total-vm:9474284kB, anon-rss:1076384kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2776kB oom_score_adj:0 [Sat Feb 19 03:47:01 2022] Memory cgroup out of memory: Killed process 2081218 (ovs-vswitchd) total-vm:9475332kB, anon-rss:1096988kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2780kB oom_score_adj:0 [Sat Feb 19 03:47:06 2022] Memory cgroup out of memory: Killed process 2081616 (ovs-vswitchd) total-vm:9473252kB, anon-rss:1073052kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2784kB oom_score_adj:0 [Sat Feb 19 03:47:16 2022] Memory cgroup out of memory: Killed process 2081940 (ovs-vswitchd) total-vm:9471236kB, anon-rss:1070920kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2776kB oom_score_adj:0 [Sat Feb 19 03:47:16 2022] Memory cgroup out of memory: Killed process 6098 (nova-compute) total-vm:3428356kB, anon-rss:279920kB, file-rss:9868kB, shmem-rss:0kB, UID:64060 pgtables:1020kB oom_score_adj:0 [Mon Feb 21 11:15:08 2022] Memory cgroup out of memory: Killed process 2082296 (ovs-vswitchd) total-vm:9475372kB, anon-rss:1162636kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2864kB oom_score_adj:0
Any advice on how to fix this ? Also any best practices document on configuring memory optimizations in nova compute node. so one thing to note is that the OOM reaper service runs per numa node so the gloable free memory values are not really what you need to look at.
croups/systemd also provide ways to limit the max memory a process/cgroup tree can consume so your first stp shoudl be to determin if the oom event was triggered by exaustign the memroy in a specific numa node or if you are hitting a differnt cgroup memory limit. in terms fo how to optimisze memroy it really depend on what your end goal is here. obviously we do not want ovs or nova-compute to be killsed in general. the virtaul and reseden memory for ovs is in the 10 to 1 GB range so that is not excessively large. that also should not be anywhere near the numa limit but if you were incorrectly creating numa affinged vms without seting hw:mem_page_size via nova then that could perhaps tirgger out of memory events effectivly with openenstack if your vm is numa affined eiter explictly via hw:numa_nodes extra specs or implictly via cpu pinning or otherwise then you must define that the memory is tracked using the numa aware path which requires you do defien hw:mem_page_size in the flaovr or hw_mem_page_size in the image. if you do not want ot use hugepages hw:mem_page_size=small is a good default but just be aware that if the vm has a numa topology then memory over subsction is not supproted in openstack. i.e. you cannot use cpu pinning or any other feature that requriees a numa topoplogy like virtual persistent memory and also use memory over subscription. assuming these event do not corralate with vm boots then i woudl investiagte the cgroup memory limits you set on teh ovs and compute service cgroups. if they are correatlted with vm boots check if the vm is numa affined and if it is which page size is requested. ir its hw:mem_page_size=small then you might need to use the badly named reserved_huge_pages config option to reserve 4k pages for the host per numa node. https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... e.g. reserve 4G on node 0 and 1 reserved_huge_pages = node:0,size:4,count:1048576 reserved_huge_pages = node:1,size:4,count:1048576 the sum of all the 4k page size reservation should equal the value of reserved_host_memory_mb https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... this is only really needed where you are usign numa instances since reserved_host_memory_mb dose not account for the host numa toplogy so it will not prevent the numa node form being exausted. if you are usign a amd epyc system and the NPS( numa per socket) bios option set to say 4 or 8 on a dual or single socket system respecivly then the the 1TB of ram you have on the host woudl be devided into numa nodes of 128GB each which is very close to the 121 used you have when you star seeing issues. nova currently tries to fill the numa nodes in order when you have numa instances too which causes the OOM issue to manifest much sooner then people often exepct due to the per numa nature of OOM reaper. that may not help you in your case but that is how i would approch tracking down this issue.
Ammad
On Mon, Feb 21, 2022 at 6:21 PM Sean Mooney <smooney@redhat.com> wrote:
Hi,,
I am having trouble with my compute node that nova-compute and ovs
On Mon, 2022-02-21 at 12:24 +0500, Ammad Syed wrote: process
are being killed by OOM. I have alot memory available in system.
# free -g total used free shared buff/cache available Mem: 1006 121 881 0 2 879 Swap: 7 0 7
But I am seeing process are being killed in dmesg.
[Sat Feb 19 03:46:26 2022] Memory cgroup out of memory: Killed process 2080898 (ovs-vswitchd) total-vm:9474284kB, anon-rss:1076384kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2776kB oom_score_adj:0 [Sat Feb 19 03:47:01 2022] Memory cgroup out of memory: Killed process 2081218 (ovs-vswitchd) total-vm:9475332kB, anon-rss:1096988kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2780kB oom_score_adj:0 [Sat Feb 19 03:47:06 2022] Memory cgroup out of memory: Killed process 2081616 (ovs-vswitchd) total-vm:9473252kB, anon-rss:1073052kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2784kB oom_score_adj:0 [Sat Feb 19 03:47:16 2022] Memory cgroup out of memory: Killed process 2081940 (ovs-vswitchd) total-vm:9471236kB, anon-rss:1070920kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2776kB oom_score_adj:0 [Sat Feb 19 03:47:16 2022] Memory cgroup out of memory: Killed process 6098 (nova-compute) total-vm:3428356kB, anon-rss:279920kB, file-rss:9868kB, shmem-rss:0kB, UID:64060 pgtables:1020kB oom_score_adj:0 [Mon Feb 21 11:15:08 2022] Memory cgroup out of memory: Killed process 2082296 (ovs-vswitchd) total-vm:9475372kB, anon-rss:1162636kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2864kB oom_score_adj:0
Any advice on how to fix this ? Also any best practices document on configuring memory optimizations in nova compute node. so one thing to note is that the OOM reaper service runs per numa node so the gloable free memory values are not really what you need to look at.
croups/systemd also provide ways to limit the max memory a process/cgroup tree can consume
so your first stp shoudl be to determin if the oom event was triggered by exaustign the memroy in a specific numa node or if you are hitting a differnt cgroup memory limit.
As the logs suggest, it looks like the memory of cgroup is exhausted. The memory of system.slice cgroup is 4G and user.slice is 2G by default. I have increased system.slice to 64GB.
in terms fo how to optimisze memroy it really depend on what your end goal is here.
obviously we do not want ovs or nova-compute to be killsed in general. the virtaul and reseden memory for ovs is in the 10 to 1 GB range so that is not excessively large.
that also should not be anywhere near the numa limit but if you were incorrectly creating numa affinged vms without seting hw:mem_page_size via nova then that could perhaps tirgger out of memory events
Currently I am not using any page_size in my flavors.
effectivly with openenstack if your vm is numa affined eiter explictly via hw:numa_nodes extra specs or implictly via cpu pinning or otherwise then you must define that the memory is tracked using the numa aware path which requires you do defien hw:mem_page_size in the flaovr or hw_mem_page_size in the image.
I am only using CPU soft pinning (vcpu placement) i.e cpu_shared_set. However I have only configured hw:cpu_sockets='2' in flavors to make two sockets for VM. This helps in effective cpu utilization in windows hosts. However in VM I can only see one numa node of memory. Will this possibly cause trouble ?
if you do not want ot use hugepages hw:mem_page_size=small is a good default but just be aware that if the vm has a numa topology then memory over subsction is not supproted in openstack. i.e. you cannot use cpu pinning or any other feature that requriees a numa topoplogy like virtual persistent memory and also use memory over subscription.
Got it.
assuming these event do not corralate with vm boots then i woudl investiagte the cgroup memory limits you set on teh ovs and compute service cgroups. if they are correatlted with vm boots check if the vm is numa affined and if it is which page size is requested. ir its hw:mem_page_size=small then you might need to use the badly named reserved_huge_pages config option to reserve 4k pages for the host per numa node.
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... e.g. reserve 4G on node 0 and 1 reserved_huge_pages = node:0,size:4,count:1048576 reserved_huge_pages = node:1,size:4,count:1048576 the sum of all the 4k page size reservation should equal the value of reserved_host_memory_mb
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res...
Currently I have reserved_host_memory_mb <https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_memory_mb> with 64GB memory reserved for host and no oversubscription i.e memory over provisioning factor set to 1.0 in compute nodes.
this is only really needed where you are usign numa instances since reserved_host_memory_mb dose not account for the host numa toplogy so it will not prevent the numa node form being exausted.
if you are usign a amd epyc system and the NPS( numa per socket) bios option set to say 4 or 8 on a dual or single socket system respecivly then the the 1TB of ram you have on the host woudl be devided into numa nodes of 128GB each which is very close to the 121 used you have when you star seeing issues.
Yes I am using epyc system and checked in BIOS NPS is set to 1.
nova currently tries to fill the numa nodes in order when you have numa instances too which causes the OOM issue to manifest much sooner then people often exepct due to the per numa nature of OOM reaper.
that may not help you in your case but that is how i would approch tracking down this issue.
This indeed helped a lot. It's been last 18 hours, and no OOM Killed observed till now.
Ammad
On Mon, Feb 21, 2022 at 6:21 PM Sean Mooney <smooney@redhat.com> wrote:
Hi,,
I am having trouble with my compute node that nova-compute and ovs
On Mon, 2022-02-21 at 12:24 +0500, Ammad Syed wrote: process
are being killed by OOM. I have alot memory available in system.
# free -g total used free shared buff/cache available Mem: 1006 121 881 0 2 879 Swap: 7 0 7
But I am seeing process are being killed in dmesg.
[Sat Feb 19 03:46:26 2022] Memory cgroup out of memory: Killed process 2080898 (ovs-vswitchd) total-vm:9474284kB, anon-rss:1076384kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2776kB oom_score_adj:0 [Sat Feb 19 03:47:01 2022] Memory cgroup out of memory: Killed process 2081218 (ovs-vswitchd) total-vm:9475332kB, anon-rss:1096988kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2780kB oom_score_adj:0 [Sat Feb 19 03:47:06 2022] Memory cgroup out of memory: Killed process 2081616 (ovs-vswitchd) total-vm:9473252kB, anon-rss:1073052kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2784kB oom_score_adj:0 [Sat Feb 19 03:47:16 2022] Memory cgroup out of memory: Killed process 2081940 (ovs-vswitchd) total-vm:9471236kB, anon-rss:1070920kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2776kB oom_score_adj:0 [Sat Feb 19 03:47:16 2022] Memory cgroup out of memory: Killed process 6098 (nova-compute) total-vm:3428356kB, anon-rss:279920kB, file-rss:9868kB, shmem-rss:0kB, UID:64060 pgtables:1020kB oom_score_adj:0 [Mon Feb 21 11:15:08 2022] Memory cgroup out of memory: Killed process 2082296 (ovs-vswitchd) total-vm:9475372kB, anon-rss:1162636kB, file-rss:11700kB, shmem-rss:0kB, UID:0 pgtables:2864kB oom_score_adj:0
Any advice on how to fix this ? Also any best practices document on configuring memory optimizations in nova compute node. so one thing to note is that the OOM reaper service runs per numa node so the gloable free memory values are not really what you need to look at.
croups/systemd also provide ways to limit the max memory a process/cgroup tree can consume
so your first stp shoudl be to determin if the oom event was triggered by exaustign the memroy in a specific numa node or if you are hitting a differnt cgroup memory limit.
As the logs suggest, it looks like the memory of cgroup is exhausted. The memory of system.slice cgroup is 4G and user.slice is 2G by default. I have increased system.slice to 64GB. ack this seam so be out side the scope of nova then. nova does not magne host cgroups
On Tue, 2022-02-22 at 17:51 +0500, Ammad Syed wrote: libvirt does create cgroups for the vms but nova has not role in any cgroup management.
in terms fo how to optimisze memroy it really depend on what your end goal is here.
obviously we do not want ovs or nova-compute to be killsed in general. the virtaul and reseden memory for ovs is in the 10 to 1 GB range so that is not excessively large.
that also should not be anywhere near the numa limit but if you were incorrectly creating numa affinged vms without seting hw:mem_page_size via nova then that could perhaps tirgger out of memory events
Currently I am not using any page_size in my flavors.
ack
effectivly with openenstack if your vm is numa affined eiter explictly via hw:numa_nodes extra specs or implictly via cpu pinning or otherwise then you must define that the memory is tracked using the numa aware path which requires you do defien hw:mem_page_size in the flaovr or hw_mem_page_size in the image.
I am only using CPU soft pinning (vcpu placement) i.e cpu_shared_set. However I have only configured hw:cpu_sockets='2' in flavors to make two sockets for VM. This helps in effective cpu utilization in windows hosts. However in VM I can only see one numa node of memory. Will this possibly cause trouble ?
no it should not. hw:cpu_sockets='2' alters the cpu toplogy but does not modify the guest virtual numa toplogy. by default all guest will be reproted as having 1 numa node but without requesting a numa toptopogy, directly or indirectly we will not provide any numa affintiy by default. old servers (12+ years old) with a front side bus architture had multipel sockets per numa node since the memory contoler was located on the north bridge. while this is not a common toplogy these days i woudl not expect it to have any negitve performance impacts on the vm or windwos running itn the vm. setting hw:cpu_sockets='1' woudl also likely improved the windwos guest cpu utiliastion while being more typical of a real host topology but i doubt you will see any meaning full perfromacne delta. i generally recommend setting hw:cpu_sockets equal to the number of numa nodes more out of consitancy then anything elese. if you expclitly have mulitipel numa nodes hw:numa_nodes=2 then hw:cpu_sockets='2' can help the guest kernel make better schduilg decisions but i dont think hw:cpu_sockets='2' when the guest has 1 numa node will degrade the perfroamce.
if you do not want ot use hugepages hw:mem_page_size=small is a good default but just be aware that if the vm has a numa topology then memory over subsction is not supproted in openstack. i.e. you cannot use cpu pinning or any other feature that requriees a numa topoplogy like virtual persistent memory and also use memory over subscription.
Got it.
assuming these event do not corralate with vm boots then i woudl investiagte the cgroup memory limits you set on teh ovs and compute service cgroups. if they are correatlted with vm boots check if the vm is numa affined and if it is which page size is requested. ir its hw:mem_page_size=small then you might need to use the badly named reserved_huge_pages config option to reserve 4k pages for the host per numa node.
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... e.g. reserve 4G on node 0 and 1 reserved_huge_pages = node:0,size:4,count:1048576 reserved_huge_pages = node:1,size:4,count:1048576 the sum of all the 4k page size reservation should equal the value of reserved_host_memory_mb
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res...
Currently I have reserved_host_memory_mb <https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.reserved_host_memory_mb> with 64GB memory reserved for host and no oversubscription i.e memory over provisioning factor set to 1.0 in compute nodes.
ack since you are not numa ffinign the vms that should be sufficent it really does seam that this is just related to the cgroup config on the host.
this is only really needed where you are usign numa instances since reserved_host_memory_mb dose not account for the host numa toplogy so it will not prevent the numa node form being exausted.
if you are usign a amd epyc system and the NPS( numa per socket) bios option set to say 4 or 8 on a dual or single socket system respecivly then the the 1TB of ram you have on the host woudl be devided into numa nodes of 128GB each which is very close to the 121 used you have when you star seeing issues.
Yes I am using epyc system and checked in BIOS NPS is set to 1.
ack so in that case you likely are not exausting the numa node
nova currently tries to fill the numa nodes in order when you have numa instances too which causes the OOM issue to manifest much sooner then people often exepct due to the per numa nature of OOM reaper.
that may not help you in your case but that is how i would approch tracking down this issue.
This indeed helped a lot. It's been last 18 hours, and no OOM Killed observed till now.
based on what you have said tweakign the system and user slices is proably the way to adress this. it sound like your nova config is fine for how you are creating vms. im not sure how you have deployed openstack/openvswtich in this case but i suspect the cgroups limist the install or you applied as part fo the instalation are jsut a little too low and if you increase them it will work ok.
Ammad
participants (2)
-
Ammad Syed
-
Sean Mooney