<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 11, 2017 at 3:37 AM, Lars Kellogg-Stedman <span dir="ltr"><<a href="mailto:lars@redhat.com" target="_blank">lars@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="gmail-">On Fri, Jul 7, 2017 at 1:50 PM, James Slagle <span dir="ltr"><<a href="mailto:james.slagle@gmail.com" target="_blank">james.slagle@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">There are also some ideas forming around pulling the Ansible playbooks<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

and vars out of Heat so that they can be rerun (or run initially)<br>

independently from the Heat SoftwareDeployment delivery mechanism:<br></blockquote><div><br></div></span><div>I think the closer we can come to "the operator runs ansible-playbook to configure the overcloud" the better, but not because I think Ansible is inherently a great tool: rather, I think the many layers of indirection in our existing model make error reporting and diagnosis much more complicated that it needs to be.  Combined with Puppet's "fail as late as possible" model, this means that (a) operators waste time waiting for a deployment that is ultimately going to fail but hasn't yet, and (b) when it does fail, they need relatively intimate knowledge of our deployment tools to backtrack through logs and find the root cause of the failure.</div><div><br></div><div>If we can offer a deployment mode that reduces the number of layers between the operator and the actions being performed on the hosts I think we would win on both fronts: faster failures and reporting errors as close as possible to the actual problem will result in less frustration across the board.</div><div><br></div><div>I do like Steve's suggestion of a split model where Heat is responsible for instantiating OpenStack resources while Ansible is used to perform host configuration tasks.  Despite all the work done on Ansible's OpenStack modules, they feel inflexible and frustrating to work with when compared to Heat's state-aware, dependency ordered deployments.  A solution that allows Heat to output configuration that can subsequently be consumed by Ansible -- either running manually or perhaps via Mistral for API-driven-deployments -- seems like an excellent goal.  Using Heat as a "front-end" to the process means that we get to keep the parameter validation and documentation that is missing in Ansible, while still following the Unix philosophy of giving you enough rope to hang yourself if you really want it.</div><span class="gmail-HOEnZb"><font color="#888888"><div></div></font></span></div></div></div></blockquote></div><br></div><div class="gmail_extra">I think this nicely sums up what we should be aiming for, but I'd like to elaborate on "either running manually or perhaps via Mistral for API-driven-deployments".</div><div class="gmail_extra"><br></div><div class="gmail_extra">I think its important that we allow full support for both mistral-driven and manually running playbooks. If there was no option to run ansible-playbook directly then operators would miss one of the main benefits of using ansible in the first place (which is leveraging their knowledge of inventory, playbooks and roles to deploy things).</div><div class="gmail_extra"><br></div><div class="gmail_extra">I'm thinking specifically about upgrade scenarios where a step fails. Currently the only option is a manual diagnosis of the problem, manual modification of state, then re-running the entire stack update to see if it can get past the failing step.</div><div class="gmail_extra"><br></div><div class="gmail_extra">What would be nice is when a heat->mistral->ansible upgrade step fails, the operator is given an ansible-playbook command to run which skips directly to the failing step. This would dramatically reduce the debug cycle and also make it possible for the operator to automate any required fixes over every host in a role. This would likely mean rendering out ansible config files, playbooks, (and roles?) to the operator's working directory. What happens to these rendered files after deployment is an open question. Delete them? Encourage the operator to track them in source control?</div></div>