[openstack-dev] [TripleO] Removing global bootstrap_nodeid?

Jiří Stránský jistr at redhat.com
Tue Sep 25 13:05:51 UTC 2018


Hi Steve,

On 25/09/2018 10:51, Steven Hardy wrote:
> Hi all,
> 
> After some discussions with bandini at the PTG, I've been taking a
> look at this bug and how to solve it:
> 
> https://bugs.launchpad.net/tripleo/+bug/1792613
> (Also more information in downstream bz1626140)
> 
> The problem is that we always run container bootstrap tasks (as well
> as a bunch of update/upgrade tasks) on the bootstrap_nodeid, which by
> default is always the overcloud-controller-0 node (regardless of which
> services are running there).
> 
> This breaks a pattern we established a while ago for Composable HA,
> where we' work out the bootstrap node by
> $service_short_bootstrap_hostname, which means we always run on the
> first node that has the service enabled (even if it spans more than
> one Role).
> 
> This presents two problems:
> 
> 1. service_config_settings only puts the per-service hieradata on
> nodes where a service is enabled, hence data needed for the
> bootstrapping (e.g keystone users etc) can be missing if, say,
> keystone is running on some role that's not Controller (this, I think
> is the root-cause of the bug/bz linked above)
> 
> 2. Even when we by luck have the data needed to complete the bootstrap
> tasks, we'll end up pulling service containers to nodes where the
> service isn't running, potentially wasting both time and space.
> 
> I've been looking at solutions, and it seems like we either have to
> generate per-service bootstrap_nodeid's (I have a patch to do this
> https://review.openstack.org/605010), or perhaps we could just remove
> all the bootstrap node id's, and switch to using hostnames instead?
> Seems like that could be simpler, but wanted to check if there's
> anything I'm missing?

I think we should recheck he initial assumptions, because based on my 
testing:

* the bootstrap_nodeid is in fact a hostname already, despite its 
deceptive name,

* it's not global, it is per-role.

 From my env:

[root at overcloud-controller-2 ~]# hiera -c /etc/puppet/hiera.yaml 
bootstrap_nodeid
overcloud-controller-0

[root at overcloud-novacompute-1 ~]# hiera -c /etc/puppet/hiera.yaml 
bootstrap_nodeid
overcloud-novacompute-0

This makes me think the problems (1) and (2) as stated above shouldn't 
be happening. The containers or tasks present in service definition 
should be executed on all nodes where a service is present, and if they 
additionally filter for bootstrap_nodeid, it would only pick one node 
per role. So, the service *should* be deployed on the selected bootstrap 
node, which means the service-specific hiera should be present there and 
needless container downloading should not be happening, AFAICT.

However, thinking about it this way, we probably have a different 
problem still:

(3) The actions which use bootstrap_nodeid check are not guaranteed to 
execute once per service. In case the service is present on multiple 
roles, the bootstrap_nodeid check succeeds once per such role.

Using per-service bootstrap host instead of per-role bootstrap host 
sounds like going the right way then.

However, none of the above provides a solid explanation of what's really 
happening in the LP/BZ mentioned above. Hopefully this info is at least 
a piece of the puzzle.

Jirka



More information about the OpenStack-dev mailing list