[Openstack] Help, erroneous resource tracker preventing instances from starting

Byron McCollum byron.mccollum at rackspace.com
Tue Jan 8 01:02:05 UTC 2013


See if this bug might be related to your problem...

https://bugs.launchpad.net/nova/+bug/1060363

Byron


Begin forwarded message "[Openstack] Base images removed in upgrade essex -> folsom and other stories":

> We also came across an issue where some compute nodes were reporting bogus resource stats. Eg:
> 
> 2012-11-13 05:04:38 INFO nova.compute.manager [-] Updating host status
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free ram (MB): -739665
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 12654
> 2012-11-13 05:06:14 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -188
> 2012-11-13 05:06:14 INFO nova.compute.resource_tracker [-] Compute_service record updated for np-rcc6
> 
> This happened to be addressed by the following bug, it turns out it does a regex for the db filter.
> https://bugs.launchpad.net/nova/+bug/1060363
> 
> So a compute node of np-rcc5 would also pull in np-rcc50, np-rcc51.. and so on and so on. 
> 


On Jan 7, 2013, at 9:50 AM, Jonathan Proulx <jon at jonproulx.com> wrote:

> Hi All,
> 
> I have a growing problem in which compute nodes are puzzlingly over reporting their resource utilization and thus appearing to be over utilized when they are in fact empty.  System is Ubuntu 12.04 using cloud archive Folsom (2012.2-0ubuntu5~cloud0) problem appeared on a single node after upgrade from Essex some months ago and has now grown to 5 nodes (the lowest numbered 5 nodes both by IP and lexically by name)
> 
> For example on the compute node "nova-1":
> 
> 2013-01-07 10:39:43 INFO nova.compute.manager [-] Updating host status
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free ram (MB): -397134
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free disk (GB): -3318
> 2013-01-07 10:41:02 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -215
> 2013-01-07 10:41:02 INFO nova.compute.resource_tracker [-] Compute_service record updated for nova-1 
> 
> Oddly even though no instances are scheduled teh resource utilization does vary, for example in the last 5hours:
> 
> root at nova-1:~# grep 'Free VCPUS:' /var/log/nova/nova-compute.log|awk '{print $NF}'|sort -n |uniq -c
>     156 -218
>       3 -216
>       5 -215
>       2 -214
>       2 -212
>       1 -211
>       1 -210
>       5 -209
>       5 -208
> 
> # but no instances are running
> root at nova-1:~# virsh list
>  Id    Name                           State
> ----------------------------------------------------
> 
> root at nova-1:~# 
> 
> # nor does OpenStack seem to *think* any instances are running or reserved by any projects
> # as seen by nova-manage service describe_resource nova-1
> 
> HOST                              PROJECT     cpu mem(mb)     hdd
> nova-1          (total)                        24   48295     602
> nova-1          (used_now)                    233  433141    3740
> nova-1          (used_max)                      0       0       0
> # note lack of a list of tenants here
> 
> I can't replicate the issue intetionally but also can't clear appaerent resource utilization.  Tried direct manipulation of the database but that gets reset by computenode reports, tried rebooting the nodes.  I can always fall back to just reinstalling them, but since this is still a pre-production cluster I'd liek to understand what is happening.
> 
> Anyone have an insight into why nova.compute.resource_tracker is so confused or how I can force it to understand what resources are in use? Operationally it isn't painful to reinstall, but it does hurt a bit not knowing what's going on here.
> 
> Thanks,
> -Jon
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp





More information about the Openstack mailing list