[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate
iwamoto at valinux.co.jp
Fri Feb 3 06:55:51 UTC 2017
At Wed, 1 Feb 2017 16:24:54 -0800,
Armando M. wrote:
> [TL;DR]: OpenStack services have steadily increased their memory
> footprints. We need a concerted way to address the oom-kills experienced in
> the openstack gate, as we may have reached a ceiling.
> Now the longer version:
> We have been experiencing some instability in the gate lately due to a
> number of reasons. When everything adds up, this means it's rather
> difficult to merge anything and knowing we're in feature freeze, that adds
> to stress. One culprit was identified to be .
> We initially tried to increase the swappiness, but that didn't seem to
> help. Then we have looked at the resident memory in use. When going back
> over the past three releases we have noticed that the aggregated memory
> footprint of some openstack projects has grown steadily. We have the
Not sure if it is due to memory shortage, VMs running CI jobs are
experiencing sluggishness, which may be the cause of ovs related
timeouts. Tempest jobs run dstat to collect system info every
second. When timeouts happen, dstat outputs are also often missing
for several seconds, which means a VM is having trouble scheduling
both ovs related processes and the dstat process.
Those ovs timeouts affect every project and happen much often than the
Some details are on the lp bug page.
Correlation of such sluggishness and VM paging activities are not
clear. I wonder if VM hosts are under high load or if increasing VM
memory would help. Those VMs have no free ram for file cache and file
pages are read again and again, leading to extra IO loads on VM hosts
and adversely affecting other VMs on the same host.
More information about the OpenStack-dev