[openstack-dev] [gate][neutron][infra] tempest jobs timing out due to general sluggishness of the node?

Ihar Hrachyshka ihrachys at redhat.com
Fri Feb 10 04:59:09 UTC 2017

Hi all,

I noticed lately a number of job failures in neutron gate that all
result in job timeouts. I describe
gate-tempest-dsvm-neutron-dvr-ubuntu-xenial job below, though I see
timeouts happening in other jobs too.

The failure mode is all operations, ./stack.sh and each tempest test
take significantly more time (like 50% to 150% more, which results in
job timeout triggered). An example of what I mean can be found in [1].

A good run usually takes ~20 minutes to stack up devstack; then ~40
minutes to pass full suite; a bad run usually takes ~30 minutes for
./stack.sh; and then 1:20h+ until it is killed due to timeout.

It affects different clouds (we see rax, internap, infracloud-vanilla,
ovh jobs affected; we haven't seen osic though). It can't be e.g. slow
pypi or apt mirrors because then we would see slowdown in ./stack.sh
phase only.

We can't be sure that CPUs are the same, and devstack does not seem to
dump /proc/cpuinfo anywhere (in the end, it's all virtual, so not sure
if it would help anyway). Neither we have a way to learn whether
slowliness could be a result of adherence to RFC1149. ;)

We discussed the matter in neutron channel [2] though couldn't figure
out the culprit, or where to go next. At this point we assume it's not
neutron's fault, and we hope others (infra?) may have suggestions on
where to look.

[1] http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/console.html#_2017-02-09_04_47_12_874550
[2] http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2017-02-10.log.html#t2017-02-10T04:06:01


