Open Stack

Sun May 22 23:41:09 UTC 2016

Hello Novaites,

I've noticed that the Intel NFV CI has been failing all test runs for 
quite some time (at least a few days), always failing the same tests 
around shelve/unshelve operations.

The shelve/unshelve Tempest tests always result in a timeout exception 
being raised, looking similar to the following, from [1]:

2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base Traceback 
(most recent call last):
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base   File 
"tempest/api/compute/base.py", line 166, in server_check_teardown
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base 
cls.server_id, 'ACTIVE')
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base   File 
"tempest/common/waiters.py", line 95, in wait_for_server_status
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base     raise 
exceptions.TimeoutException(message)
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base 
TimeoutException: Request timed out
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base Details: 
(ServerActionsTestJSON:tearDown) Server 
cae6fd47-0968-4922-a03e-3f2872e4eb52 failed to reach ACTIVE status and 
task state "None" within the required time (196 s). Current status: 
SHELVED_OFFLOADED. Current task state: None.

I looked through the conductor and compute logs to see if I could find 
any possible reasons for the errors and found a number of the following 
errors in the compute logs:

2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52] Traceback (most recent call last):
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]   File 
"/opt/stack/new/nova/nova/compute/manager.py", line 4230, in 
_unshelve_instance
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]     with 
rt.instance_claim(context, instance, limits):
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", 
line 271, in inner
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]     return f(*args, **kwargs)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]   File 
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 151, in 
instance_claim
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52] 
self._update_usage_from_instance(context, instance_ref)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]   File 
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 827, in 
_update_usage_from_instance
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]     self._update_usage(instance, 
sign=sign)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]   File 
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 666, in 
_update_usage
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]     self.compute_node, usage, free)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]   File 
"/opt/stack/new/nova/nova/virt/hardware.py", line 1482, in 
get_host_numa_usage_from_instance
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]     host_numa_topology, 
instance_numa_topology, free=free))
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]   File 
"/opt/stack/new/nova/nova/virt/hardware.py", line 1348, in 
numa_usage_from_instances
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]     newcell.unpin_cpus(pinned_cpus)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]   File 
"/opt/stack/new/nova/nova/objects/numa.py", line 94, in unpin_cpus
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52]     pinned=list(self.pinned_cpus))
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: 
cae6fd47-0968-4922-a03e-3f2872e4eb52] CPUPinningInvalid: Cannot 
pin/unpin cpus [6] from the following pinned set [0, 2, 4]

on or around the time of the failures in Tempest.

Perhaps tomorrow morning we can look into handling the above exception 
properly from the compute manager, since clearly we shouldn't be 
allowing CPUPinningInvalid to be raised in the resource tracker's 
_update_usage() call....

Anyway, see you on IRC tomorrow morning and let's try to fix this.

Best,
-jay

[1] 
http://intel-openstack-ci-logs.ovh/86/319686/1/check/tempest-dsvm-full-nfv/b463722/testr_results.html.gz

Open Stack

[openstack-dev] [nova] Intel NFV CI failing all shelve/unshelve tests

OpenStack

Community

Documentation

Branding & Legal