[openstack-dev] [gate][neutron][infra] tempest jobs timing out due to general sluggishness of the node?
cboylan at sapwetik.org
Fri Feb 10 16:39:55 UTC 2017
On Fri, Feb 10, 2017, at 08:21 AM, Morales, Victor wrote:
> On 2/9/17, 10:59 PM, "Ihar Hrachyshka" <ihrachys at redhat.com> wrote:
> >Hi all,
> >I noticed lately a number of job failures in neutron gate that all
> >result in job timeouts. I describe
> >gate-tempest-dsvm-neutron-dvr-ubuntu-xenial job below, though I see
> >timeouts happening in other jobs too.
> >The failure mode is all operations, ./stack.sh and each tempest test
> >take significantly more time (like 50% to 150% more, which results in
> >job timeout triggered). An example of what I mean can be found in .
> >A good run usually takes ~20 minutes to stack up devstack; then ~40
> >minutes to pass full suite; a bad run usually takes ~30 minutes for
> >./stack.sh; and then 1:20h+ until it is killed due to timeout.
> >It affects different clouds (we see rax, internap, infracloud-vanilla,
> >ovh jobs affected; we haven't seen osic though). It can't be e.g. slow
> >pypi or apt mirrors because then we would see slowdown in ./stack.sh
> >phase only.
> >We can't be sure that CPUs are the same, and devstack does not seem to
> >dump /proc/cpuinfo anywhere (in the end, it's all virtual, so not sure
> I don’t think that logging this information could be useful mainly
> because this depends on enabling *host-passthrough* in nova-compute
> configuration of Public cloud providers
While this is true we do log it anyways (was useful for sorting out live
migration cpu flag inconsistencies). For example:
and grep for 'cpu'.
Note that we used to grab proper /proc/cpuinfo contents but now its just
whatever ansible is reporting back in its fact list there.
More information about the OpenStack-dev