[Openstack-operators] nova_api resource_providers table issues on ocata

Sylvain Bauza sbauza at redhat.com
Wed Oct 17 14:01:15 UTC 2018


On Wed, Oct 17, 2018 at 12:56 AM Jay Pipes <jaypipes at gmail.com> wrote:

> On 10/16/2018 10:11 AM, Sylvain Bauza wrote:
> > On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano
> > <ignaziocassano at gmail.com <mailto:ignaziocassano at gmail.com>> wrote:
> >
> >     Hi everybody,
> >     when on my ocata installation based on centos7 I update (only update
> >     not  changing openstack version) some kvm compute nodes, I
> >     diescovered uuid in resource_providers nova_api db table are
> >     different from uuid in compute_nodes nova db table.
> >     This causes several errors in nova-compute service, because it not
> >     able to receive instances anymore.
> >     Aligning uuid from compute_nodes solves this problem.
> >     Could anyone tel me if it is a bug ?
> >
> >
> > What do you mean by "updating some compute nodes" ? In Nova, we consider
> > uniqueness of compute nodes by a tuple (host, hypervisor_hostname) where
> > host is your nova-compute service name for this compute host, and
> > hypervisor_hostname is in the case of libvirt the 'hostname' reported by
> > the libvirt API [1]
> >
> > If somehow one of the two values change, then the Nova Resource Tracker
> > will consider this new record as a separate compute node, hereby
> > creating a new compute_nodes table record, and then a new UUID.
> > Could you please check your compute_nodes table and see whether some
> > entries were recently created ?
>
> The compute_nodes table has no unique constraint on the
> hypervisor_hostname field unfortunately, even though it should. It's not
> like you can have two compute nodes with the same hostname. But, alas,
> this is one of those vestigial tails in nova due to poor initial table
> design and coupling between the concept of a nova-compute service worker
> and the hypervisor resource node itself.
>
>
Sorry if I was unclear, but I meant we have a UK for (host,
hypervisor_hostname, deleted) (I didn't explain about deleted, but meh).
https://github.com/openstack/nova/blob/01c33c5/nova/db/sqlalchemy/models.py#L116-L118

But yeah, we don't have any UK for just (hypervisor_hostname, deleted),
sure.

Ignazio, I was tempted to say you may have run into this:
>
> https://bugs.launchpad.net/nova/+bug/1714248
>
> But then I see you're not using Ironic... I'm not entirely sure how you
> ended up with duplicate hypervisor_hostname records for the same compute
> node, but some of those duplicate records must have had the deleted
> field set to a non-zero value, given the constraint we currently have on
> (host, hypervisor_hostname, deleted).
>
> This means that your deployment script or some external scripts must
> have been deleting compute node records somehow, though I'm not entirely
> sure how...
>
>
Yeah that's why I asked for the compute_nodes records. Ignazio, could you
please verify this ?
Do you have multiple records for the same (host, hypervisor_hostname) tuple
?

'select from compute_nodes where host=XXX and hypervisor_hostname=YYY'


-Sylvain

Best,
> -jay
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20181017/945fae59/attachment.html>


More information about the OpenStack-operators mailing list