On Wed, Jul 10, 2019 at 4:24 PM James Slagle <james.slagle@gmail.com> wrote:
There's been a fair amount of recent work around simplifying our Heat templates and migrating the software configuration part of our deployment entirely to Ansible.
As part of this effort, it became apparent that we could render much of the data that we need out of Heat in a way that is generic per node, and then have Ansible render the node specific data during config-download runtime.
To illustrate the point, consider when we specify ComputeCount:10 in our templates, that much of the work that Heat is doing across those 10 sets of resources for each Compute node is duplication. However, it's been necessary so that Heat can render data structures such as list of IP's, lists of hostnames, contents of /etc/hosts files, etc etc etc. If all that was driven by Ansible using host facts, then Heat doesn't need to do those 10 sets of resources to begin with.
The goal is to get to a point where we can deploy the Heat stack with a count of 1 for each role, and then deploy any number of nodes per role using Ansible. To that end, I've been referring to this effort as N=1.
The value in this work is that it directly addresses our scaling issues with Heat (by just deploying a much smaller stack). Obviously we'd still be relying heavily on Ansible to scale to the required levels, but I feel that is much better understood challenge at this point in the evolution of configuration tools.
With the patches that we've been working on recently, I've got a POC running where I can deploy additional compute nodes with just Ansible. This is done by just adding the additional nodes to the Ansible inventory with a small set of facts to include IP addresses on each enabled network and a hostname.
These patches are at https://review.opendev.org/#/q/topic:bp/reduce-deployment-resources and reviews/feedback are welcome.
This is a fabulous proposal in my opinion. I've added (and will continue to add) TODO ideas in the etherpad. Anyone willing to help, please ping us if needed. Another point, somewhat related: I took the opportunity of this work to reduce the complexity around the number of hieradata files. I would like to investigate if we can generate one data file which would be loaded by both Puppet and Ansible for doing the configuration management. I'll create a separated thread on that effort very soon.
Other points:
- Baremetal provisioning and port creation are presently handled by Heat. With the ongoing efforts to migrate baremetal provisioning out of Heat (nova-less deploy), I think these efforts are very complimentary. Eventually, we get to a point where Heat is not actually creating any other OpenStack API resources. For now, the patches only work when using pre-provisioned nodes.
- We need to consider how we'd manage the Ansible inventory going forward if we open up an interface for operators to manipulate it directly. That's something we'd want to manage and preserve (version control) as it's critical data for the deployment.
Given the progress that we've made with the POC, my sense is that we'll keep pushing in this overall direction. I'd like to get some feedback on the approach. We have an etherpad we are using to track some of the work at a high level:
https://etherpad.openstack.org/p/tripleo-reduce-deployment-resources
I'll be adding some notes on how I setup the POC to that etherpad if others would like to try it out.
-- -- James Slagle --
-- Emilien Macchi