[openstack-dev] [nova] Intel NFV CI failing all shelve/unshelve tests
Jay Pipes
jaypipes at gmail.com
Sun May 22 23:41:09 UTC 2016
Hello Novaites,
I've noticed that the Intel NFV CI has been failing all test runs for
quite some time (at least a few days), always failing the same tests
around shelve/unshelve operations.
The shelve/unshelve Tempest tests always result in a timeout exception
being raised, looking similar to the following, from [1]:
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base Traceback
(most recent call last):
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base File
"tempest/api/compute/base.py", line 166, in server_check_teardown
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base
cls.server_id, 'ACTIVE')
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base File
"tempest/common/waiters.py", line 95, in wait_for_server_status
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base raise
exceptions.TimeoutException(message)
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base
TimeoutException: Request timed out
2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base Details:
(ServerActionsTestJSON:tearDown) Server
cae6fd47-0968-4922-a03e-3f2872e4eb52 failed to reach ACTIVE status and
task state "None" within the required time (196 s). Current status:
SHELVED_OFFLOADED. Current task state: None.
I looked through the conductor and compute logs to see if I could find
any possible reasons for the errors and found a number of the following
errors in the compute logs:
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] Traceback (most recent call last):
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] File
"/opt/stack/new/nova/nova/compute/manager.py", line 4230, in
_unshelve_instance
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] with
rt.instance_claim(context, instance, limits):
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] File
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py",
line 271, in inner
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] return f(*args, **kwargs)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] File
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 151, in
instance_claim
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52]
self._update_usage_from_instance(context, instance_ref)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] File
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 827, in
_update_usage_from_instance
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] self._update_usage(instance,
sign=sign)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] File
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 666, in
_update_usage
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] self.compute_node, usage, free)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] File
"/opt/stack/new/nova/nova/virt/hardware.py", line 1482, in
get_host_numa_usage_from_instance
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] host_numa_topology,
instance_numa_topology, free=free))
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] File
"/opt/stack/new/nova/nova/virt/hardware.py", line 1348, in
numa_usage_from_instances
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] newcell.unpin_cpus(pinned_cpus)
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] File
"/opt/stack/new/nova/nova/objects/numa.py", line 94, in unpin_cpus
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] pinned=list(self.pinned_cpus))
2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
cae6fd47-0968-4922-a03e-3f2872e4eb52] CPUPinningInvalid: Cannot
pin/unpin cpus [6] from the following pinned set [0, 2, 4]
on or around the time of the failures in Tempest.
Perhaps tomorrow morning we can look into handling the above exception
properly from the compute manager, since clearly we shouldn't be
allowing CPUPinningInvalid to be raised in the resource tracker's
_update_usage() call....
Anyway, see you on IRC tomorrow morning and let's try to fix this.
Best,
-jay
[1]
http://intel-openstack-ci-logs.ovh/86/319686/1/check/tempest-dsvm-full-nfv/b463722/testr_results.html.gz
More information about the OpenStack-dev
mailing list