[openstack-dev] [TripleO] Removing global bootstrap_nodeid?

Steven Hardy shardy at redhat.com
Tue Sep 25 08:51:23 UTC 2018


Hi all,

After some discussions with bandini at the PTG, I've been taking a
look at this bug and how to solve it:

https://bugs.launchpad.net/tripleo/+bug/1792613
(Also more information in downstream bz1626140)

The problem is that we always run container bootstrap tasks (as well
as a bunch of update/upgrade tasks) on the bootstrap_nodeid, which by
default is always the overcloud-controller-0 node (regardless of which
services are running there).

This breaks a pattern we established a while ago for Composable HA,
where we' work out the bootstrap node by
$service_short_bootstrap_hostname, which means we always run on the
first node that has the service enabled (even if it spans more than
one Role).

This presents two problems:

1. service_config_settings only puts the per-service hieradata on
nodes where a service is enabled, hence data needed for the
bootstrapping (e.g keystone users etc) can be missing if, say,
keystone is running on some role that's not Controller (this, I think
is the root-cause of the bug/bz linked above)

2. Even when we by luck have the data needed to complete the bootstrap
tasks, we'll end up pulling service containers to nodes where the
service isn't running, potentially wasting both time and space.

I've been looking at solutions, and it seems like we either have to
generate per-service bootstrap_nodeid's (I have a patch to do this
https://review.openstack.org/605010), or perhaps we could just remove
all the bootstrap node id's, and switch to using hostnames instead?
Seems like that could be simpler, but wanted to check if there's
anything I'm missing?

[root at overcloud-controller-0 ~]# ansible -m setup localhost | grep hostname
 [WARNING]: provided hosts list is empty, only localhost is available. Note
that the implicit localhost does not match 'all'
        "ansible_hostname": "overcloud-controller-0",
        "facter_hostname": "overcloud-controller-0",
[root at overcloud-controller-0 ~]# hiera -c /etc/puppet/hiera.yaml
xinetd_short_bootstrap_node_name
overcloud-controller-0
[root at overcloud-controller-0 ~]# hiera -c /etc/puppet/hiera.yaml
xinetd_bootstrap_nodeid
ede5f189-7149-4faf-a378-ac965a2a818c

This is the first part of the problem, when we agree the approach here
we can convert docker-puppet.py and all the *tasks to use the
per-service IDs/names instead of the global one to work properly with
composable roles/services.

Any thoughts on this appreciated before I go ahead and implement the fix.

Steve



More information about the OpenStack-dev mailing list