[Openstack-operators] nova_api resource_providers table issues on ocata

Jay Pipes jaypipes at gmail.com
Tue Oct 16 22:56:04 UTC 2018


On 10/16/2018 10:11 AM, Sylvain Bauza wrote:
> On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano 
> <ignaziocassano at gmail.com <mailto:ignaziocassano at gmail.com>> wrote:
> 
>     Hi everybody,
>     when on my ocata installation based on centos7 I update (only update
>     not  changing openstack version) some kvm compute nodes, I
>     diescovered uuid in resource_providers nova_api db table are
>     different from uuid in compute_nodes nova db table.
>     This causes several errors in nova-compute service, because it not
>     able to receive instances anymore.
>     Aligning uuid from compute_nodes solves this problem.
>     Could anyone tel me if it is a bug ?
> 
> 
> What do you mean by "updating some compute nodes" ? In Nova, we consider 
> uniqueness of compute nodes by a tuple (host, hypervisor_hostname) where 
> host is your nova-compute service name for this compute host, and 
> hypervisor_hostname is in the case of libvirt the 'hostname' reported by 
> the libvirt API [1]
> 
> If somehow one of the two values change, then the Nova Resource Tracker 
> will consider this new record as a separate compute node, hereby 
> creating a new compute_nodes table record, and then a new UUID.
> Could you please check your compute_nodes table and see whether some 
> entries were recently created ?

The compute_nodes table has no unique constraint on the 
hypervisor_hostname field unfortunately, even though it should. It's not 
like you can have two compute nodes with the same hostname. But, alas, 
this is one of those vestigial tails in nova due to poor initial table 
design and coupling between the concept of a nova-compute service worker 
and the hypervisor resource node itself.

Ignazio, I was tempted to say you may have run into this:

https://bugs.launchpad.net/nova/+bug/1714248

But then I see you're not using Ironic... I'm not entirely sure how you 
ended up with duplicate hypervisor_hostname records for the same compute 
node, but some of those duplicate records must have had the deleted 
field set to a non-zero value, given the constraint we currently have on 
(host, hypervisor_hostname, deleted).

This means that your deployment script or some external scripts must 
have been deleting compute node records somehow, though I'm not entirely 
sure how...

Best,
-jay





More information about the OpenStack-operators mailing list