<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 7, 2017 at 6:50 PM, James Slagle <span dir="ltr"><<a href="mailto:james.slagle@gmail.com" target="_blank">james.slagle@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I proposed a session for the PTG<br>
(<a href="https://etherpad.openstack.org/p/tripleo-ptg-queens" rel="noreferrer" target="_blank">https://etherpad.openstack.<wbr>org/p/tripleo-ptg-queens</a>) about forming a<br>
common plan and vision around Ansible in TripleO.<br>
<br>
I think it's important however that we kick this discussion off more<br>
broadly before the PTG, so that we can hopefully have some agreement<br>
for deeper discussions and prototyping when we actually meet in<br>
person.<br>
<br>
Right now, we have multiple uses of Ansible in TripleO:<br>
<br>
(0) tripleo-quickstart which follows the common and well accepted<br>
approach to bundling a set of Ansible playbooks/roles.<br>
<br>
(1) Mistral calling Ansible. This is the approach used by<br>
tripleo-validations where Mistral directly executes ansible playbooks<br>
using a dynamic inventory. The inventory is constructed from the<br>
server related stack outputs of the overcloud stack.<br>
<br>
(2) Ansible running playbooks against localhost triggered by the<br>
heat-config Ansible hook. This approach is used by<br>
tripleo-heat-templates for upgrade tasks and various tasks for<br>
deploying containers.<br>
<br>
(3) Mistral calling Heat calling Mistral calling Ansible. In this<br>
approach, we have Mistral resources in tripleo-heat-templates that are<br>
created as part of the overcloud stack and in turn, the created<br>
Mistral action executions run ansible. This has been prototyped with<br>
using ceph-ansible to install Ceph as part of the overcloud<br>
deployment, and some of the work has already landed. There are also<br>
proposed WIP patches using this approach to install Kubernetes.<br>
<br>
There are also some ideas forming around pulling the Ansible playbooks<br>
and vars out of Heat so that they can be rerun (or run initially)<br>
independently from the Heat SoftwareDeployment delivery mechanism:<br>
<br>
(4) <a href="https://review.openstack.org/#/c/454816/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/454816/</a><br>
<br>
(5) Another idea I'd like to prototype is a local tool that runs on<br>
the undercloud and pulls all of the SoftwareDeployment data out of<br>
Heat as the stack is being created and generates corresponding Ansible<br>
playbooks to apply those deployments. Once a given playbook is<br>
generated by the tool, the tool would signal back to Heat that the<br>
deployment is complete. Heat then creates the whole stack without<br>
actually applying a single deployment to an overcloud node. At that<br>
point, Ansible (or Mistral->Ansible for an API) would be used to do<br>
the actual deployment of the Overcloud with the Undercloud as the<br>
ansible runner.<br>
<br>
All of this work has merit as we investigate longer term plans, and<br>
it's all at different stages with some being for dev/CI (0), some<br>
being used already in production (1 and 2), some just at the<br>
experimental stage (3 and 4), and some does not exist other than an<br>
idea (5).<br>
<br>
My intent with this mail is to start a discussion around what we've<br>
learned from these approaches and start discussing a consolidated plan<br>
around Ansible. And I'm not saying that whatever we come up with<br>
should only use Ansible a certain way. Just that we ought to look at<br>
how users/operators interact with Ansible and TripleO today and try<br>
and come up with the best solution(s) going forward.<br>
<br>
I think that (1) has been pretty successful, and my idea with (5)<br>
would use a similar approach once the playbooks were generated.<br>
Further, my idea with (5) would give us a fully backwards compatible<br>
solution with our existing template interfaces from<br>
tripleo-heat-templates. Longer term (or even in parallel for some<br>
time), the generated playbooks could stop being generated (and just<br>
exist in git), and we could consider moving away from Heat more<br>
permanently<br>
<br>
I recognize that saying "moving away from Heat" may be quite<br>
controversial. While it's not 100% the same discussion as what we are<br>
doing with Ansible, I think it is a big part of the discussion and if<br>
we want to continue with Heat as the primary orchestration tool in<br>
TripleO.<br>
<br>
I've been hearing a lot of feedback from various operators about how<br>
difficult the baremetal deployment is with Heat. While feedback about<br>
Ironic is generally positive, a lot of the negative feedback is around<br>
the Heat->Nova->Ironic interaction. And, if we also move more towards<br>
Ansible for the service deployment, I wonder if there is still a long<br>
term place for Heat at all.<br>
<br>
Personally, I'm pretty apprehensive about the approach taken in (3). I<br>
feel that it is a lot of complexity that could be done simpler if we<br>
took a step back and thought more about a longer term approach. I<br>
recognize that it's mostly an experiment/POC at this stage, and I'm<br>
not trying to directly knock down the approach. It's just that when I<br>
start to see more patches (Kubernetes installation) using the same<br>
approach, I figure it's worth discussing more broadly vs trying to<br>
have a discussion by -1'ing patch reviews, etc.<br>
<br>
I'm interested in all feedback of course. And I plan to take a shot at<br>
working on the prototype I mentioned in (5) if anyone would like to<br>
collaborate around that.<br>
<br>
I think if we can form some broad agreement before the PTG, we have a<br>
chance at making some meaningful progress during Queens.<br>
<span class="gmail-HOEnZb"><font color="#888888"><br>
<br>
--<br>
-- James Slagle<br>
--<br></font></span></blockquote><div><br></div><div>I can't offer much in-depth feedback on the pros and cons of each scenario. My main point would be to try and simplify as much as we can, rather then adding yet more tooling to the stack. At the moment ooo is spread across multi repos and events are handed around multiple tool sets and queues. This adds to a very steep learning curve for the folk who have to operate these systems, as there are multiple moving parts to contend with. At the moment things seem 'duck taped' together, so we should avoid adding more complexity, and refactor down to a simpler architecture instead.</div><div><br></div><div>With that in mind [1] sounds viable to myself, but with the caveat that others might have a better view of how much of a fit that is for what we need. </div><div><br></div><div> </div><div><br></div></div>
</div></div>