State of the Gate (placement?)
Matt Riedemann
mriedemos at gmail.com
Wed Nov 6 17:14:04 UTC 2019
On 11/4/2019 6:58 PM, Clark Boylan wrote:
> Typically we try to work with the clouds to properly root cause the issue. Then from there we can figure out what the best fix may be. They are running our software after all and there is a good chance the problems are in openstack.
>
> I'm in shanghai at the moment but if others want to reach out feel free. benj_ and mgagne are at inap and amorin has been helpful at ovh. The test node logs include a hostid in them somewhere which an be used to identify hypervisors if necessary.
I noticed this today [1]. That doesn't always result in failed jobs but
I correlated it to a failure in a timeout in a nova functional job [2]
and those normally don't have these types of problems.
Note the correlation to when it spikes, midnight and noon it looks like.
The dip on 11/2 and 11/3 was the weekend. And it's mostly OVH nodes. So
they must have some kind of cron or something that hits at those times?
Anecdotally, I'll also note that it seems like the gate is much more
stable this week while the summit is happening. We're actually able to
merge some changes in nova which is kind of amazing given the last month
or so of rechecks we've had to do.
[1]
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Function%20'nova.servicegroup.drivers.db.DbDriver._report_state'%20run%20outlasted%20interval%20by%5C%22&from=7d
[2]
https://zuul.opendev.org/t/openstack/build/63001bbd58c244cea70c995f1ebf61fb/log/job-output.txt#3092
--
Thanks,
Matt
More information about the openstack-discuss
mailing list