[nova] Nasty new gate regression as of 4/15 - bug 1825435

Matt Riedemann mriedemos at gmail.com
Fri Apr 19 13:22:04 UTC 2019

I spotted this yesterday [1] and according to logstash it showed up 
around 4/15. It's only hitting on nova unit tests, and I think is 
somehow related to the TestRPC unit tests, or maybe those just stall out 
as a result when we hit the stack overflow.

I don't think it's due to any new oslo.config or oslo.messaging versions 
because there haven't been any, and it hits on both the 
lower-constraints and py36 jobs (so different versions of those packages).

I've looked through the nova changes that merged since around 4/14 but 
nothing is jumping out at me that might be causing this stack overflow 
in the oslo.config code, but the cells v1 removal patches are pretty big 
and I'm wondering if something snuck through there - the cells v1 unit 
tests were doing some RPC stuff as I recall so maybe that's related.

We need all eyes on this since it's a high failure rate and we're 
already experiencing really slow turnaround times in the gate.

[1] https://bugs.launchpad.net/nova/+bug/1825435




