when in newton release were introduced role separation, we divided memory hungry processes into 4 different VM's on 3 physical boxes: 1) Networker: all Neutron agent processes (network throughput) 2) Systemd: all services started by systemd (Neutron) 3) pcs: all services controlled by pcs (Galera + RabbitMQ) 4) horizon not sure how to do now, I think I will go for VMs again and those VMs will include containers. It is easier to recover and rebuild the whole OpenStack. Gregory > do you have local storage for swift and cinder background? if local; then do you use RAID? if yes; then which RAID?; fi do you use ssd? do you use CEPH as background for cinder and swift? fi also double check where _base image is located? is it in /var/lib/nova/instances/_base/* ? and flavor disks stored in /var/lib/nova/instances ? (can check on compute by: virsh domiflist instance-00000## ) On Thu, 1 Aug 2019 at 09:25, Gregory Orange <gregory.orange@pawsey.org.au> wrote:
Hi again everyone,
On 1/8/19 11:12 am, Gregory Orange wrote:
We have a Queens/Rocky environment with haproxy in front of most services. Recently we've found a problem when creating multiple instances (2 VCPUs, 6GB RAM) from large images. The behaviour is the same whether we use Horizon or Terraform, so I've continued on with Terraform since it's easier to repeat attempts.
As a followup, I found a neutron server stuck with one of its cores consumed to 100%, and RAM and swap exhausted. After rebooting that server, everything worked fine. Over the next hour, RAM and swap was exhausted again by lots of spawning processes (a few hundred neutron-rootwrap-daemon), and oom-killer cleaned it up, resulting in a loop where it fills and empties RAM every 20-60 minutes. We have some other neutron changes planned, so for now we have left that one turned off, and the other two (which have less RAM) are working fine without these symptoms.
Strange, but I'm glad to have found something, and that it's working for now.
Regards, Greg.
-- Ruslanas Gžibovskis +370 6030 7030