On Wed, May 29, 2024, at 1:50 PM, Brian Haley wrote:
Hi,
Neutron has been having issues with our coverage gate job triggering the OOM killer since last week [0], which I just confirmed by holding a node and looking in the logs. It started happening after the sqlalchemy 2.0 bump [1], but that just might be exposing the underlying issue.
Running locally I can see via /proc/meminfo that memory is getting consumed:
MemTotal: 8123628 kB MemFree: 1108404 kB
And via ps it's the coverage processes doing it:
PID %MEM RSS PPID TIME NLWP WCHAN COMMAND
4315 30.9 2516348 4314 01:29:07 1 - /opt/stack/neutron/.tox/cover/bin/python /opt/stack/neutron/.tox/cover/bin/coverage run --source neutron --parallel-mode -m stestr.subunit_runner.run discover -t ./ ./neutron/tests/unit --load-list /tmp/tmp0rhqfwhz 4313 30.0 2437500 4312 01:28:50 1 - /opt/stack/neutron/.tox/cover/bin/python /opt/stack/neutron/.tox/cover/bin/coverage run --source neutron --parallel-mode -m stestr.subunit_runner.run discover -t ./ ./neutron/tests/unit --load-list /tmp/tmpfzmqyuub
(and the test hasn't even finished yet)
Only workaround seems to be reducing concurrency [2].
Other things that came to mind are that maybe you are gathering coverage info for more files that necessary This isn't the case; --source neutron is passed and looking at coverage reports we can see no other sources are included. I also notice that upper-constraints for coverage is set to 7.5.1 but there is a (very recent) 7.5.3 release which claims to have some memory improvements [3] that may be worth trying. The code that was modified to improve memory use was introduced in 7.5.0 as well (if I've read git history properly anyway). Looking at requirements we jumped from 7.4.4 to 7.5.1 less than a week ago [4]. Depending on the timing of this new issue this may be more than coincidence.
Have any other projects seen anything similar?
(and sorry for the html email)
-Brian
[0] https://bugs.launchpad.net/neutron/+bug/2065821 [1] https://review.opendev.org/c/openstack/requirements/+/879743
[3] https://coverage.readthedocs.io/en/7.5.3/changes.html [4] https://review.opendev.org/c/openstack/requirements/+/920283/3/upper-constra...