mriedemos at gmail.com
Sat Jan 26 00:47:06 UTC 2019
Time for a quick update on gate status.
* There were some shelve tests that were failing ssh pretty badly in the
tempest-slow job due to a neutron issue:
https://launchpad.net/bugs/1812552. It seems
https://review.openstack.org/#/c/631944/ might have squashed that bug.
* Probably our biggest issue right now is test_subnet_details failing:
http://status.openstack.org/elastic-recheck/#1813198. I suspect that is
somehow related to using cirros 0.4.0 in devstack as of Jan 20. I have a
tempest patch up for review to help debug that when it fails
https://review.openstack.org/#/c/633225 since it seems we're not parsing
nic names properly which is how we get the mangled udhcpc..pid file name.
* Another nasty one that is affecting unit/functional tests (the bug is
against nova but the query hits other projects as well) is
http://status.openstack.org/elastic-recheck/#1813147 where subunit
parsing fails. It seems cinder had to deal with something like this
recently too so the nova team needs to figure out what cinder did to
resolve this. I'm not sure if this is a recent regression or not, but
the logstash trends start around Jan 17 so it could be recent.
* https://bugs.launchpad.net/cinder/+bug/1810526 is a cinder bug related
to etcd intermittently dropping connections and then cinder services hit
ToozConnectionErrors which cause other things to fail, like volume
status updates are lost during delete and then tempest times out waiting
for the volume to be deleted. I have a fingerprint in the bug but it
shows up in successful jobs too which is frustrating. I would expect
that for grenade while services are being restarted (although do we
restart etcd in grenade?) but it also shows up in non-grenade jobs. I
believe cinder is just using tooz+etcd as a distributed lock manager so
I'm not sure how valid it would be to add retries on that locking code
or not when the service is unavailable. One suggestion in IRC was to not
use tooz/etcd for DLM in single-node jobs but that kind of side-steps
the issue - but if etcd is lagging because of lots of services eating up
resources on the single node, it might not be a bad option.
More information about the openstack-discuss