<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 12, 2017 at 11:47 AM, James Slagle <span dir="ltr"><<a href="mailto:james.slagle@gmail.com" target="_blank">james.slagle@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5">On Tue, Jul 11, 2017 at 6:53 PM, Steve Baker <<a href="mailto:sbaker@redhat.com">sbaker@redhat.com</a>> wrote:<br>

><br>

><br>

> On Tue, Jul 11, 2017 at 6:51 AM, James Slagle <<a href="mailto:james.slagle@gmail.com">james.slagle@gmail.com</a>><br>

> wrote:<br>

>><br>

>> On Mon, Jul 10, 2017 at 11:37 AM, Lars Kellogg-Stedman <<a href="mailto:lars@redhat.com">lars@redhat.com</a>><br>

>> wrote:<br>

>> > On Fri, Jul 7, 2017 at 1:50 PM, James Slagle <<a href="mailto:james.slagle@gmail.com">james.slagle@gmail.com</a>><br>

>> > wrote:<br>

>> >><br>

>> >> There are also some ideas forming around pulling the Ansible playbooks<br>

>> >><br>

>> >> and vars out of Heat so that they can be rerun (or run initially)<br>

>> >> independently from the Heat SoftwareDeployment delivery mechanism:<br>

>> ><br>

>> ><br>

>> > I think the closer we can come to "the operator runs ansible-playbook to<br>

>> > configure the overcloud" the better, but not because I think Ansible is<br>

>> > inherently a great tool: rather, I think the many layers of indirection<br>

>> > in<br>

>> > our existing model make error reporting and diagnosis much more<br>

>> > complicated<br>

>> > that it needs to be.  Combined with Puppet's "fail as late as possible"<br>

>> > model, this means that (a) operators waste time waiting for a deployment<br>

>> > that is ultimately going to fail but hasn't yet, and (b) when it does<br>

>> > fail,<br>

>> > they need relatively intimate knowledge of our deployment tools to<br>

>> > backtrack<br>

>> > through logs and find the root cause of the failure.<br>

>> ><br>

>> > If we can offer a deployment mode that reduces the number of layers<br>

>> > between<br>

>> > the operator and the actions being performed on the hosts I think we<br>

>> > would<br>

>> > win on both fronts: faster failures and reporting errors as close as<br>

>> > possible to the actual problem will result in less frustration across<br>

>> > the<br>

>> > board.<br>

>> ><br>

>> > I do like Steve's suggestion of a split model where Heat is responsible<br>

>> > for<br>

>> > instantiating OpenStack resources while Ansible is used to perform host<br>

>> > configuration tasks.  Despite all the work done on Ansible's OpenStack<br>

>> > modules, they feel inflexible and frustrating to work with when compared<br>

>> > to<br>

>> > Heat's state-aware, dependency ordered deployments.  A solution that<br>

>> > allows<br>

>> > Heat to output configuration that can subsequently be consumed by<br>

>> > Ansible --<br>

>> > either running manually or perhaps via Mistral for<br>

>> > API-driven-deployments --<br>

>> > seems like an excellent goal.  Using Heat as a "front-end" to the<br>

>> > process<br>

>> > means that we get to keep the parameter validation and documentation<br>

>> > that is<br>

>> > missing in Ansible, while still following the Unix philosophy of giving<br>

>> > you<br>

>> > enough rope to hang yourself if you really want it.<br>

>><br>

>> This is excellent input, thanks for providing it.<br>

>><br>

>> I think it lends itself towards suggesting that we may like to persue<br>

>> (again) adding native Ironic resources to Heat. If those were written<br>

>> in a way that also addressed some of the feedback about TripleO and<br>

>> the baremetal deployment side, then we could continue to get the<br>

>> advantages from Heat that you mention.<br>

>><br>

>> My personal opinion to date is that Ansible's os_ironic* modules are<br>

>> superior in some ways to the Heat->Nova->Ironic model. However, just a<br>

>> Heat->Ironic model may work in a way that has the advantages of both.<br>

><br>

><br>

> I too would dearly like to get nova out of the picture. Our placement needs<br>

> mean the scheduler is something we need to work around, and it discards<br>

> basically all context for the operator when ironic can't deploy for some<br>

> reason.<br>

><br>

> Whether we use a mistral workflow[1], a heat resource, or ansible os_ironic,<br>

> there will still need to be some python logic to build the config drive ISO<br>

> that injects the ssh keys and os-collect-config bootstrap.<br>

><br>

> Unfortunately ironic iPXE boot from iSCSI[2] doesn't support config-drive<br>

> (still?) so the only option to inject ssh keys is the nova ec2-metadata<br>

> service (or equivalent). I suspect if we can't make every ironic deployment<br>

> method support config-drive then we're stuck with nova.<br>

><br>

> I don't have a strong preference for a heat resource vs mistral vs ansible<br>

> os_ironic, but given there is some python logic required anyway, I would<br>

> lean towards a heat resource. If the resource is general enough we could<br>

> propose it to heat upstream, otherwise we could carry it in tripleo-common.<br>

><br>

> Alternatively, we can implement a config-drive builder in tripleo-common and<br>

> invoke that from mistral or ansible.<br>

<br>

</div></div>Ironic's cli node-set-provision-state command has a --config-drive<br>

option where you just point it a directory and it will automatically<br>

bundle that dir into the config drive ISO format.<br>

<br>

Ansible's os_ironic_node[1] also supports that via the config_drive<br>

parameter. Combining that with a couple of template tasks to create<br>

meta_data.json and user_data files makes for a very easy to user<br>

interface.<br>

<br>

<br>

[1] <a href="http://docs.ansible.com/ansible/os_ironic_node_module.html" rel="noreferrer" target="_blank">http://docs.ansible.com/<wbr>ansible/os_ironic_node_module.<wbr>html</a><br>

<div class="gmail-HOEnZb"><div class="gmail-h5"></div></div></blockquote></div><br></div><div class="gmail_extra">Oh, that makes it easier. That just leaves the issue of 4 of the 5 scenarios in [2] not supporting config drive. The options I see here are:</div><div class="gmail_extra">a. nova forever</div><div class="gmail_extra">b. not support any boot from volume scenarios in TripleO that don't work with config-drive</div><div class="gmail_extra">c. write our own small metadata service (its basically serving machine specific static http content, so can maybe be done with some apache fu)</div><div class="gmail_extra"><br></div><div class="gmail_extra">If b. is acceptable then maybe I can un-abandon [3]?</div><div class="gmail_extra"><br></div><div class="gmail_extra">[2] <a href="http://specs.openstack.org/openstack/ironic-specs/specs/approved/boot-from-volume-reference-drivers.html">http://specs.openstack.org/openstack/ironic-specs/specs/approved/boot-from-volume-reference-drivers.html</a></div><div class="gmail_extra">[3] <a href="https://review.openstack.org/#/c/400407/">https://review.openstack.org/#/c/400407/</a></div></div>