Re: Proposal to move cinder backup tests out of the integrated gate

11 Apr 2019

      ---- On Thu, 11 Apr 2019 11:50:06 -0500 Matt Riedemann <mriedemos@gmail.com> wrote ----
...
On 12/12/2018 2:00 PM, Matt Riedemann wrote:
...
I wanted to send this separate from the latest gate status update [1]  
since it's primarily about latent cinder bugs causing failures in the  
gate for which no one is really investigating.
Running down our tracked gate bugs [2] there are several related to  
cinder-backup testing:
* http://status.openstack.org/elastic-recheck/#1483434 
* http://status.openstack.org/elastic-recheck/#1745168 
* http://status.openstack.org/elastic-recheck/#1739482 
* http://status.openstack.org/elastic-recheck/#1635643
All of those bugs were reported a long time ago. I've done some  
investigation into them (at least at the time of reporting) and some are  
simply due to cinder-api using synchronous RPC calls to cinder-volume  
(or cinder-backup) and that doesn't scale. This bug isn't a backup  
issue, but it's definitely related to using RPC call rather than cast:
http://status.openstack.org/elastic-recheck/#1763712
Regarding the backup tests specifically, I don't see a reason why they  
need to be run in the integrated gate jobs, e.g. tempest-full(-py3).  
They don't involve other services, so in my opinion we should move the  
backup tests to a separate job which only runs on cinder changes to  
alleviate these latent bugs failing jobs for unrelated changes and  
resetting the entire gate.
I would need someone from the cinder team that is more involved in  
knowing what their job setup looks like to identify a candidate job for  
these tests if this is something everyone can agree on doing.
[1]  
http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000867....
[2] http://status.openstack.org/elastic-recheck/
This is an old thread but gmann recently skipping a cinder backup test  
which was failing a lot [1] prompted me to revisit this.
As such I've proposed a change [2] which will disable the cinder-backup  
service in the tempest-full job which is in the integrated-gate project  
template and run by most projects.
at end goal i agree on this but this will skip all backup tests whihc are running fine,
so let's wait till we move those tests to cindet tempest plugin or run on other integrated
job etc.
...
There is a voting job running against cinder changes named  
"cinder-tempest-dsvm-lvm-lio-barbican" which will still test the backup  
service but it's not gating - it's up to the cinder team if they want to  
make that job gating. The other thing is it doesn't look like that job runs 
on glance (or swift) changes so if the cinder team is interested in  
co-gating changes between at least cinder and glance, they could add  
cinder-tempest-dsvm-lvm-lio-barbican to glance so it runs there and/or  
create a new cinder-backup job which just runs backup tests and gate on  
that in both cinder and glance.
Initially, I was on the side to test/run everything together but on second thought
and by seeing tempest-full unstable I agree with you to find some solution to make
integrated-gate template testing (tempest-full) more efficient and stable for each service.

neutron also face lot of test failure due to volume backup or image tests which definitely
not related to neutron and not worth to block neutron development for that.

I have added this topic in QA PTG etherpad to find the best possible solution.
- https://etherpad.openstack.org/p/qa-train-ptg 

-gmann
...
[1] https://review.openstack.org/#/c/651660/ 
[2] https://review.openstack.org/#/c/651865/
--
Thanks,
Matt

Re: Proposal to move cinder backup tests out of the integrated gate

Ghanshyam Mann