[TripleO] Scaling node counts with only Ansible (N=1)

Harald Jensås hjensas at redhat.com
Fri Jul 12 19:59:34 UTC 2019

On Wed, 2019-07-10 at 16:17 -0400, James Slagle wrote:
> There's been a fair amount of recent work around simplifying our Heat
> templates and migrating the software configuration part of our
> deployment entirely to Ansible.
> As part of this effort, it became apparent that we could render much
> of the data that we need out of Heat in a way that is generic per
> node, and then have Ansible render the node specific data during
> config-download runtime.
> To illustrate the point, consider when we specify ComputeCount:10 in
> our templates, that much of the work that Heat is doing across those
> 10 sets of resources for each Compute node is duplication. However,
> it's been necessary so that Heat can render data structures such as
> list of IP's, lists of hostnames, contents of /etc/hosts files, etc
> etc etc. If all that was driven by Ansible using host facts, then
> Heat
> doesn't need to do those 10 sets of resources to begin with.
> The goal is to get to a point where we can deploy the Heat stack with
> a count of 1 for each role, and then deploy any number of nodes per
> role using Ansible. To that end, I've been referring to this effort
> as
> N=1.
> The value in this work is that it directly addresses our scaling
> issues with Heat (by just deploying a much smaller stack). Obviously
> we'd still be relying heavily on Ansible to scale to the required
> levels, but I feel that is much better understood challenge at this
> point in the evolution of configuration tools.
> With the patches that we've been working on recently, I've got a POC
> running where I can deploy additional compute nodes with just
> Ansible.
> This is done by just adding the additional nodes to the Ansible
> inventory with a small set of facts to include IP addresses on each
> enabled network and a hostname.
> These patches are at
> https://review.opendev.org/#/q/topic:bp/reduce-deployment-resources
> and reviews/feedback are welcome.
> Other points:
> - Baremetal provisioning and port creation are presently handled by
> Heat. With the ongoing efforts to migrate baremetal provisioning out
> of Heat (nova-less deploy), I think these efforts are very
> complimentary. Eventually, we get to a point where Heat is not
> actually creating any other OpenStack API resources. For now, the
> patches only work when using pre-provisioned nodes.

I've said this before, but I think we should turn this nova-less
around. Now with nova-less we create a bunch of servers, and write up
the parameters file to use the deployed-server approach. Effectively we
still neet to have the resource group in heat making a server resource
for every server. Creating the fake server resource is fast, because
Heat does'nt call Nova,Ironic to create any resources. But the stack is
equally big, with a stack for every node. i.e not N=1.

What you are doing here, is essentially to say we don't create a
resource group that then creates N number of role stacks, one for each
overcloud node. You are creating a single generic "server" definition
per Role. So we drop the resource group and create
OS::Triple::{{Role}}.Server 1-time (once). To me it's backwards to push
a large struct with properties for N=many nodes into the creation of
that stack.

Currently the puppet/role-role.yaml creates all the network ports etc.
As you only want to create it once, it instead could simply output the
UUID of the networks+subnets. These are identical for all servers in
the role. So we end up with a small heat stack.

Once the stack is created we could use that generic "server" role data
to feed into something (ansible?, python?, mistral?) that calls
metalsmith to build the servers, then create ports for each server in
neutron, one port for each network+subnet defined in the role. Then
feed that output into the json (hieradata) that is pushed to each node
and used during service configuration, all the things we need to
configure network interfaces, /etc/hosts and so on. We need a way to
keep track of which ports belong to wich node, but I guess something
simple like using the node's ironic UUID in either the name,
description or tag field of the neutron port will work. There is also
the extra filed in Ironic which is json type, so we could place a map
of network->port_uuid in there as well.

Another idea I've been pondering is if we put credentials on the
overcloud nodes so that the node itself could make the call to neutron
on the undercloud to create ports in neutron. I.e we just push the UUID
of the correct network and subnet where the resource should be created,
and let the overcloud node do the create. The problem with this is that
we wouldn't have a way to build the /etc/hosts and probably other
things that include ips etc for all the nodes. Maby if all the nodes
was part of an etcd cluster, and pushed it's data there?

I think the creation of the actual Networks and Subnets can be left in
heat, it's typically 5-6 networks and 5-6 subnets so it's not a lot of
resources. Even in a large DCN deployment having 50-100 subnets per
network or even 50-100 networks I think this is'nt a problem.

> - We need to consider how we'd manage the Ansible inventory going
> forward if we open up an interface for operators to manipulate it
> directly. That's something we'd want to manage and preserve (version
> control) as it's critical data for the deployment.
> Given the progress that we've made with the POC, my sense is that
> we'll keep pushing in this overall direction. I'd like to get some
> feedback on the approach. We have an etherpad we are using to track
> some of the work at a high level:
> https://etherpad.openstack.org/p/tripleo-reduce-deployment-resources
> I'll be adding some notes on how I setup the POC to that etherpad if
> others would like to try it out.

More information about the openstack-discuss mailing list