Thanks a lot Wes + team for the hard work, every day to keep CI stable.
Greetings,
Status update...
Master: GREEN
Stable Branches impacted by:
Now we are trying to promote each branch to level out pacemaker on the node and containers. Queens is promoting now.
Train: GREEN
Queens: RED ( current priority )
In addition to the pacemaker issue which has resolved in our periodic testing jobs, we're hitting issues w/ instances timing out in tempest
Stein: RED
Also seems to have the same issue as Queens
Rocky: RED
Also seems to have the same issue as Queens
I will be promoting Rocky to level out pacemaker next.
Additional notes may be found:
--
Status Update:
Master: Green, but seeing several jobs failing on random tempest or container start issues atm. No pattern yet
Train: Green
Stein: Green
Rocky: Green
Queens: Green, the coverage here on scenario jobs is terrible as they are all failing. There is some discrepancy between periodic and check as periodic only went red on April 26th [1] vs. check jobs started going red on March 24th [2]
Improvements in progress:
At the moment we test CentOS CR [3] in our RDO periodic pipelines which is NOT sufficient to protect upstream jobs. It would not catch a pacemaker mismatch between the nodepool node and containers. Containers are rebuilt for each test in RDO.
To help catch issues w/ the latest CentOS packages in CR we are making the following changes to our upstream periodic jobs. Gabriele Cerami and myself are working through the design now. Input is welcome.
TLDR: Using the upstream zuul ensures that the containers and nodes have the potential to mismatch in versions and thus catching pacemaker issues in advance.
Thanks all
Thanks Emilien!