Update on gate status for the new year

Matt Riedemann mriedemos at gmail.com
Fri Jan 4 19:59:11 UTC 2019


On 1/4/2019 12:12 PM, Clark Boylan wrote:
> Overall things look pretty good based on elastic-recheck data. That said I think this is mostly due to low test volume over the holidays and our 10 day index window. We should revisit this next week or the week after to get a more accurate view of things.
> 
> On the infra team side of things we've got quota issues in a cloud region that has decreased our test node capacity. Waiting on people to return from holidays to take a look at that. We also started tracking hypervisor IDs for our test instances (thank you pabelanger) to try and help identify when specific hypervisors might be the cause of some of our issues.https://review.openstack.org/628642  is a followup to index that data with our job log data in Elasticsearch.
> 
> We've seen some ssh failures in tripleo jobs on limestone [0] and neutron and zuul report constrained IOPS there resulting in failed database migrations. I think the idea with 628642 is to see if we can narrow that down to specific hypervisors.
> 
> On the project side of things our categorization rates are quite low [1][2]. If your changes are evicted from the gate due to failures it would be helpful if you could spend a few minutes to try and identify and fingerprint those failures.

On a side note, I've noticed tempest jobs failing and elastic-recheck 
wasn't commenting on the changes. Turns out that's because we're using a 
really limited regex for the jobs that e-r will process in order to 
comment on a change in gerrit. The following patch should help with that:

https://review.openstack.org/#/c/628669/

But since "dsvm" isn't standard in job names anymore it's clear that e-r 
is going to be skipping a lot of project-specific jobs which otherwise 
have categorized failures.

-- 

Thanks,

Matt



More information about the openstack-discuss mailing list