Time for a quick update on gate status. * There were some shelve tests that were failing ssh pretty badly in the tempest-slow job due to a neutron issue: https://launchpad.net/bugs/1812552. It seems https://review.openstack.org/#/c/631944/ might have squashed that bug. * Probably our biggest issue right now is test_subnet_details failing: http://status.openstack.org/elastic-recheck/#1813198. I suspect that is somehow related to using cirros 0.4.0 in devstack as of Jan 20. I have a tempest patch up for review to help debug that when it fails https://review.openstack.org/#/c/633225 since it seems we're not parsing nic names properly which is how we get the mangled udhcpc..pid file name. * Another nasty one that is affecting unit/functional tests (the bug is against nova but the query hits other projects as well) is http://status.openstack.org/elastic-recheck/#1813147 where subunit parsing fails. It seems cinder had to deal with something like this recently too so the nova team needs to figure out what cinder did to resolve this. I'm not sure if this is a recent regression or not, but the logstash trends start around Jan 17 so it could be recent. * https://bugs.launchpad.net/cinder/+bug/1810526 is a cinder bug related to etcd intermittently dropping connections and then cinder services hit ToozConnectionErrors which cause other things to fail, like volume status updates are lost during delete and then tempest times out waiting for the volume to be deleted. I have a fingerprint in the bug but it shows up in successful jobs too which is frustrating. I would expect that for grenade while services are being restarted (although do we restart etcd in grenade?) but it also shows up in non-grenade jobs. I believe cinder is just using tooz+etcd as a distributed lock manager so I'm not sure how valid it would be to add retries on that locking code or not when the service is unavailable. One suggestion in IRC was to not use tooz/etcd for DLM in single-node jobs but that kind of side-steps the issue - but if etcd is lagging because of lots of services eating up resources on the single node, it might not be a bad option. -- Thanks, Matt