Hi,
Neutron has been having issues with our coverage gate job triggering the OOM killer since last week [0], which I just confirmed by holding a node and looking in the logs. It started happening after the sqlalchemy 2.0 bump [1], but that just might be exposing the underlying issue.
Running locally I can see via /proc/meminfo that memory is getting consumed:
MemTotal: 8123628 kB MemFree: 1108404 kB
And via ps it's the coverage processes doing it:
PID %MEM RSS PPID TIME NLWP WCHAN COMMAND
4315 30.9 2516348 4314 01:29:07 1 - /opt/stack/neutron/.tox/cover/bin/python /opt/stack/neutron/.tox/cover/bin/coverage run --source neutron --parallel-mode -m stestr.subunit_runner.run discover -t ./ ./neutron/tests/unit --load-list /tmp/tmp0rhqfwhz 4313 30.0 2437500 4312 01:28:50 1 - /opt/stack/neutron/.tox/cover/bin/python /opt/stack/neutron/.tox/cover/bin/coverage run --source neutron --parallel-mode -m stestr.subunit_runner.run discover -t ./ ./neutron/tests/unit --load-list /tmp/tmpfzmqyuub
(and the test hasn't even finished yet)
Only workaround seems to be reducing concurrency [2].
Have any other projects seen anything similar?
(and sorry for the html email)
-Brian
[0] https://bugs.launchpad.net/neutron/+bug/2065821 [1] https://review.opendev.org/c/openstack/requirements/+/879743