Open Stack

Fri Feb 10 18:54:59 UTC 2017

Oh nice, I haven't seen that. It does give (virtualized) CPU model
types. I don't see a clear correlation between models and
failures/test times though. We of course miss some more details, like
flags being emulated, but I doubt it will give us a clue.

It would be interesting to know the overcommit/system load for each
hypervisor affected. But I assume we don't have access to that info,
right?

Ihar

On Fri, Feb 10, 2017 at 8:39 AM, Clark Boylan <cboylan at sapwetik.org> wrote:
> On Fri, Feb 10, 2017, at 08:21 AM, Morales, Victor wrote:
>>
>> On 2/9/17, 10:59 PM, "Ihar Hrachyshka" <ihrachys at redhat.com> wrote:
>>
>> >Hi all,
>> >
>> >I noticed lately a number of job failures in neutron gate that all
>> >result in job timeouts. I describe
>> >gate-tempest-dsvm-neutron-dvr-ubuntu-xenial job below, though I see
>> >timeouts happening in other jobs too.
>> >
>> >The failure mode is all operations, ./stack.sh and each tempest test
>> >take significantly more time (like 50% to 150% more, which results in
>> >job timeout triggered). An example of what I mean can be found in [1].
>> >
>> >A good run usually takes ~20 minutes to stack up devstack; then ~40
>> >minutes to pass full suite; a bad run usually takes ~30 minutes for
>> >./stack.sh; and then 1:20h+ until it is killed due to timeout.
>> >
>> >It affects different clouds (we see rax, internap, infracloud-vanilla,
>> >ovh jobs affected; we haven't seen osic though). It can't be e.g. slow
>> >pypi or apt mirrors because then we would see slowdown in ./stack.sh
>> >phase only.
>> >
>> >We can't be sure that CPUs are the same, and devstack does not seem to
>> >dump /proc/cpuinfo anywhere (in the end, it's all virtual, so not sure
>>
>> I don’t think that logging this information could be useful mainly
>> because this depends on enabling *host-passthrough*[3] in nova-compute
>> configuration of Public cloud providers
>
> While this is true we do log it anyways (was useful for sorting out live
> migration cpu flag inconsistencies). For example:
> http://logs.openstack.org/95/429095/2/check/gate-tempest-dsvm-neutron-dvr-ubuntu-xenial/35aa22f/logs/devstack-gate-setup-host.txt.gz
> and grep for 'cpu'.
>
> Note that we used to grab proper /proc/cpuinfo contents but now its just
> whatever ansible is reporting back in its fact list there.
>
> Clark
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [gate][neutron][infra] tempest jobs timing out due to general sluggishness of the node?

OpenStack

Community

Documentation

Branding & Legal