Proposal to move cinder backup tests out of the integrated gate

Jay Bryant jungleboyj at gmail.com
Thu Dec 13 15:03:18 UTC 2018


On 12/12/2018 2:00 PM, Matt Riedemann wrote:
> I wanted to send this separate from the latest gate status update [1] 
> since it's primarily about latent cinder bugs causing failures in the 
> gate for which no one is really investigating.
>
Matt, thank you for putting together this information.  I am sorry that 
these issues with Cinder are impacting Nova's ability to merge code.  I 
don't think we knew that this was having an impact on Nova.
> Running down our tracked gate bugs [2] there are several related to 
> cinder-backup testing:
>
> * http://status.openstack.org/elastic-recheck/#1483434
> * http://status.openstack.org/elastic-recheck/#1745168
> * http://status.openstack.org/elastic-recheck/#1739482
> * http://status.openstack.org/elastic-recheck/#1635643
>
> All of those bugs were reported a long time ago. I've done some 
> investigation into them (at least at the time of reporting) and some 
> are simply due to cinder-api using synchronous RPC calls to 
> cinder-volume (or cinder-backup) and that doesn't scale. This bug 
> isn't a backup issue, but it's definitely related to using RPC call 
> rather than cast:
>
> http://status.openstack.org/elastic-recheck/#1763712
>
Thanks to bringing this up Dan Smith has proposed a patch that may help 
with the timeouts.  https://review.openstack.org/#/c/624809/ The thought 
is that cocurrent LVM processes might be the source of the timeout.  We 
will continue to work with Dan on that patch.
> Regarding the backup tests specifically, I don't see a reason why they 
> need to be run in the integrated gate jobs, e.g. tempest-full(-py3). 
> They don't involve other services, so in my opinion we should move the 
> backup tests to a separate job which only runs on cinder changes to 
> alleviate these latent bugs failing jobs for unrelated changes and 
> resetting the entire gate.
>
> I would need someone from the cinder team that is more involved in 
> knowing what their job setup looks like to identify a candidate job 
> for these tests if this is something everyone can agree on doing.
>
We have a member of the team that might have some bandwidth to start 
working on check/gate issues.  I have added this issue to our meeting 
agenda for next week.  We should be able to get attention from the team 
members can help at that point in time.
> [1] 
> http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000867.html
> [2] http://status.openstack.org/elastic-recheck/
>



More information about the openstack-discuss mailing list