[openstack-dev] [gate][neutron][infra] tempest jobs timing out due to general sluggishness of the node?

Clark Boylan cboylan at sapwetik.org
Fri Feb 10 16:39:55 UTC 2017


On Fri, Feb 10, 2017, at 08:21 AM, Morales, Victor wrote:
> 
> On 2/9/17, 10:59 PM, "Ihar Hrachyshka" <ihrachys at redhat.com> wrote:
> 
> >Hi all,
> >
> >I noticed lately a number of job failures in neutron gate that all
> >result in job timeouts. I describe
> >gate-tempest-dsvm-neutron-dvr-ubuntu-xenial job below, though I see
> >timeouts happening in other jobs too.
> >
> >The failure mode is all operations, ./stack.sh and each tempest test
> >take significantly more time (like 50% to 150% more, which results in
> >job timeout triggered). An example of what I mean can be found in [1].
> >
> >A good run usually takes ~20 minutes to stack up devstack; then ~40
> >minutes to pass full suite; a bad run usually takes ~30 minutes for
> >./stack.sh; and then 1:20h+ until it is killed due to timeout.
> >
> >It affects different clouds (we see rax, internap, infracloud-vanilla,
> >ovh jobs affected; we haven't seen osic though). It can't be e.g. slow
> >pypi or apt mirrors because then we would see slowdown in ./stack.sh
> >phase only.
> >
> >We can't be sure that CPUs are the same, and devstack does not seem to
> >dump /proc/cpuinfo anywhere (in the end, it's all virtual, so not sure
> 
> I don’t think that logging this information could be useful mainly
> because this depends on enabling *host-passthrough*[3] in nova-compute
> configuration of Public cloud providers

While this is true we do log it anyways (was useful for sorting out live
migration cpu flag inconsistencies). For example:
http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/logs/devstack-gate-setup-host.txt.gz
and grep for 'cpu'.

Note that we used to grab proper /proc/cpuinfo contents but now its just
whatever ansible is reporting back in its fact list there.

Clark



More information about the OpenStack-dev mailing list