---- On Thu, 11 Apr 2019 11:50:06 -0500 Matt Riedemann <mriedemos@gmail.com> wrote ----
On 12/12/2018 2:00 PM, Matt Riedemann wrote:
I wanted to send this separate from the latest gate status update [1] since it's primarily about latent cinder bugs causing failures in the gate for which no one is really investigating.
Running down our tracked gate bugs [2] there are several related to cinder-backup testing:
* http://status.openstack.org/elastic-recheck/#1483434 * http://status.openstack.org/elastic-recheck/#1745168 * http://status.openstack.org/elastic-recheck/#1739482 * http://status.openstack.org/elastic-recheck/#1635643
All of those bugs were reported a long time ago. I've done some investigation into them (at least at the time of reporting) and some are simply due to cinder-api using synchronous RPC calls to cinder-volume (or cinder-backup) and that doesn't scale. This bug isn't a backup issue, but it's definitely related to using RPC call rather than cast:
http://status.openstack.org/elastic-recheck/#1763712
Regarding the backup tests specifically, I don't see a reason why they need to be run in the integrated gate jobs, e.g. tempest-full(-py3). They don't involve other services, so in my opinion we should move the backup tests to a separate job which only runs on cinder changes to alleviate these latent bugs failing jobs for unrelated changes and resetting the entire gate.
I would need someone from the cinder team that is more involved in knowing what their job setup looks like to identify a candidate job for these tests if this is something everyone can agree on doing.
[1] http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000867....
This is an old thread but gmann recently skipping a cinder backup test which was failing a lot [1] prompted me to revisit this.
As such I've proposed a change [2] which will disable the cinder-backup service in the tempest-full job which is in the integrated-gate project template and run by most projects.
at end goal i agree on this but this will skip all backup tests whihc are running fine, so let's wait till we move those tests to cindet tempest plugin or run on other integrated job etc.
There is a voting job running against cinder changes named "cinder-tempest-dsvm-lvm-lio-barbican" which will still test the backup service but it's not gating - it's up to the cinder team if they want to make that job gating. The other thing is it doesn't look like that job runs on glance (or swift) changes so if the cinder team is interested in co-gating changes between at least cinder and glance, they could add cinder-tempest-dsvm-lvm-lio-barbican to glance so it runs there and/or create a new cinder-backup job which just runs backup tests and gate on that in both cinder and glance.
Initially, I was on the side to test/run everything together but on second thought and by seeing tempest-full unstable I agree with you to find some solution to make integrated-gate template testing (tempest-full) more efficient and stable for each service. neutron also face lot of test failure due to volume backup or image tests which definitely not related to neutron and not worth to block neutron development for that. I have added this topic in QA PTG etherpad to find the best possible solution. - https://etherpad.openstack.org/p/qa-train-ptg -gmann
[1] https://review.openstack.org/#/c/651660/ [2] https://review.openstack.org/#/c/651865/
--
Thanks,
Matt