[openstack-dev] [nova] Intel NFV CI failing all shelve/unshelve tests

Chris Friesen chris.friesen at windriver.com
Wed May 25 16:39:37 UTC 2016


On 05/22/2016 05:41 PM, Jay Pipes wrote:
> Hello Novaites,
>
> I've noticed that the Intel NFV CI has been failing all test runs for quite some
> time (at least a few days), always failing the same tests around shelve/unshelve
> operations.

<snip>

> I looked through the conductor and compute logs to see if I could find any
> possible reasons for the errors and found a number of the following errors in
> the compute logs:
>
> 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
> cae6fd47-0968-4922-a03e-3f2872e4eb52] Traceback (most recent call last):
> 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
> cae6fd47-0968-4922-a03e-3f2872e4eb52]   File
> "/opt/stack/new/nova/nova/compute/manager.py", line 4230, in _unshelve_instance
> 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
> cae6fd47-0968-4922-a03e-3f2872e4eb52]     with rt.instance_claim(context,
> instance, limits):

<snip>

> 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
> cae6fd47-0968-4922-a03e-3f2872e4eb52]     newcell.unpin_cpus(pinned_cpus)
> 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
> cae6fd47-0968-4922-a03e-3f2872e4eb52]   File
> "/opt/stack/new/nova/nova/objects/numa.py", line 94, in unpin_cpus
> 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
> cae6fd47-0968-4922-a03e-3f2872e4eb52]     pinned=list(self.pinned_cpus))
> 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:
> cae6fd47-0968-4922-a03e-3f2872e4eb52] CPUPinningInvalid: Cannot pin/unpin cpus
> [6] from the following pinned set [0, 2, 4]
>
> on or around the time of the failures in Tempest.
>
> Perhaps tomorrow morning we can look into handling the above exception properly
> from the compute manager, since clearly we shouldn't be allowing
> CPUPinningInvalid to be raised in the resource tracker's _update_usage() call....

First, it seems wrong to me that an _unshelve_instance() call would result in 
unpinning any CPUs.  If the instance was using pinned CPUs then I would expect 
the CPUs to be unpinned when doing the "shelve" operation.  When we do an 
instance claim as part of the "unshelve" operation we should be pinning CPUs, 
not unpinning them.

Second, the reason why CPUPinningInvalid gets raised in _update_usage() is that 
it has discovered an inconsistency in its view of resources.  In this case, it's 
trying to unpin CPU 6 from a set of pinned cpus that doesn't include CPU 6.  I 
think this is a valid concern and should result in an error log.  Whether it 
should cause the unshelve operation to fail is a separate question, but it's 
definitely a symptom that something is wrong with resource tracking on this 
compute node.

Chris




More information about the OpenStack-dev mailing list