Proposal to move cinder backup tests out of the integrated gate

12 Dec 2018

      I wanted to send this separate from the latest gate status update [1] 
since it's primarily about latent cinder bugs causing failures in the 
gate for which no one is really investigating.

Running down our tracked gate bugs [2] there are several related to 
cinder-backup testing:

* http://status.openstack.org/elastic-recheck/#1483434
* http://status.openstack.org/elastic-recheck/#1745168
* http://status.openstack.org/elastic-recheck/#1739482
* http://status.openstack.org/elastic-recheck/#1635643

All of those bugs were reported a long time ago. I've done some 
investigation into them (at least at the time of reporting) and some are 
simply due to cinder-api using synchronous RPC calls to cinder-volume 
(or cinder-backup) and that doesn't scale. This bug isn't a backup 
issue, but it's definitely related to using RPC call rather than cast:

http://status.openstack.org/elastic-recheck/#1763712

Regarding the backup tests specifically, I don't see a reason why they 
need to be run in the integrated gate jobs, e.g. tempest-full(-py3). 
They don't involve other services, so in my opinion we should move the 
backup tests to a separate job which only runs on cinder changes to 
alleviate these latent bugs failing jobs for unrelated changes and 
resetting the entire gate.

I would need someone from the cinder team that is more involved in 
knowing what their job setup looks like to identify a candidate job for 
these tests if this is something everyone can agree on doing.

[1] 
http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000867....
[2] http://status.openstack.org/elastic-recheck/

-- 

Thanks,

Matt

Proposal to move cinder backup tests out of the integrated gate

Matt Riedemann