<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jul 12, 2017 at 11:47 AM, James Slagle <span dir="ltr"><<a href="mailto:james.slagle@gmail.com" target="_blank">james.slagle@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5">On Tue, Jul 11, 2017 at 6:53 PM, Steve Baker <<a href="mailto:sbaker@redhat.com">sbaker@redhat.com</a>> wrote:<br>
><br>
><br>
> On Tue, Jul 11, 2017 at 6:51 AM, James Slagle <<a href="mailto:james.slagle@gmail.com">james.slagle@gmail.com</a>><br>
> wrote:<br>
>><br>
>> On Mon, Jul 10, 2017 at 11:37 AM, Lars Kellogg-Stedman <<a href="mailto:lars@redhat.com">lars@redhat.com</a>><br>
>> wrote:<br>
>> > On Fri, Jul 7, 2017 at 1:50 PM, James Slagle <<a href="mailto:james.slagle@gmail.com">james.slagle@gmail.com</a>><br>
>> > wrote:<br>
>> >><br>
>> >> There are also some ideas forming around pulling the Ansible playbooks<br>
>> >><br>
>> >> and vars out of Heat so that they can be rerun (or run initially)<br>
>> >> independently from the Heat SoftwareDeployment delivery mechanism:<br>
>> ><br>
>> ><br>
>> > I think the closer we can come to "the operator runs ansible-playbook to<br>
>> > configure the overcloud" the better, but not because I think Ansible is<br>
>> > inherently a great tool: rather, I think the many layers of indirection<br>
>> > in<br>
>> > our existing model make error reporting and diagnosis much more<br>
>> > complicated<br>
>> > that it needs to be. Combined with Puppet's "fail as late as possible"<br>
>> > model, this means that (a) operators waste time waiting for a deployment<br>
>> > that is ultimately going to fail but hasn't yet, and (b) when it does<br>
>> > fail,<br>
>> > they need relatively intimate knowledge of our deployment tools to<br>
>> > backtrack<br>
>> > through logs and find the root cause of the failure.<br>
>> ><br>
>> > If we can offer a deployment mode that reduces the number of layers<br>
>> > between<br>
>> > the operator and the actions being performed on the hosts I think we<br>
>> > would<br>
>> > win on both fronts: faster failures and reporting errors as close as<br>
>> > possible to the actual problem will result in less frustration across<br>
>> > the<br>
>> > board.<br>
>> ><br>
>> > I do like Steve's suggestion of a split model where Heat is responsible<br>
>> > for<br>
>> > instantiating OpenStack resources while Ansible is used to perform host<br>
>> > configuration tasks. Despite all the work done on Ansible's OpenStack<br>
>> > modules, they feel inflexible and frustrating to work with when compared<br>
>> > to<br>
>> > Heat's state-aware, dependency ordered deployments. A solution that<br>
>> > allows<br>
>> > Heat to output configuration that can subsequently be consumed by<br>
>> > Ansible --<br>
>> > either running manually or perhaps via Mistral for<br>
>> > API-driven-deployments --<br>
>> > seems like an excellent goal. Using Heat as a "front-end" to the<br>
>> > process<br>
>> > means that we get to keep the parameter validation and documentation<br>
>> > that is<br>
>> > missing in Ansible, while still following the Unix philosophy of giving<br>
>> > you<br>
>> > enough rope to hang yourself if you really want it.<br>
>><br>
>> This is excellent input, thanks for providing it.<br>
>><br>
>> I think it lends itself towards suggesting that we may like to persue<br>
>> (again) adding native Ironic resources to Heat. If those were written<br>
>> in a way that also addressed some of the feedback about TripleO and<br>
>> the baremetal deployment side, then we could continue to get the<br>
>> advantages from Heat that you mention.<br>
>><br>
>> My personal opinion to date is that Ansible's os_ironic* modules are<br>
>> superior in some ways to the Heat->Nova->Ironic model. However, just a<br>
>> Heat->Ironic model may work in a way that has the advantages of both.<br>
><br>
><br>
> I too would dearly like to get nova out of the picture. Our placement needs<br>
> mean the scheduler is something we need to work around, and it discards<br>
> basically all context for the operator when ironic can't deploy for some<br>
> reason.<br>
><br>
> Whether we use a mistral workflow[1], a heat resource, or ansible os_ironic,<br>
> there will still need to be some python logic to build the config drive ISO<br>
> that injects the ssh keys and os-collect-config bootstrap.<br>
><br>
> Unfortunately ironic iPXE boot from iSCSI[2] doesn't support config-drive<br>
> (still?) so the only option to inject ssh keys is the nova ec2-metadata<br>
> service (or equivalent). I suspect if we can't make every ironic deployment<br>
> method support config-drive then we're stuck with nova.<br>
><br>
> I don't have a strong preference for a heat resource vs mistral vs ansible<br>
> os_ironic, but given there is some python logic required anyway, I would<br>
> lean towards a heat resource. If the resource is general enough we could<br>
> propose it to heat upstream, otherwise we could carry it in tripleo-common.<br>
><br>
> Alternatively, we can implement a config-drive builder in tripleo-common and<br>
> invoke that from mistral or ansible.<br>
<br>
</div></div>Ironic's cli node-set-provision-state command has a --config-drive<br>
option where you just point it a directory and it will automatically<br>
bundle that dir into the config drive ISO format.<br>
<br>
Ansible's os_ironic_node[1] also supports that via the config_drive<br>
parameter. Combining that with a couple of template tasks to create<br>
meta_data.json and user_data files makes for a very easy to user<br>
interface.<br>
<br>
<br>
[1] <a href="http://docs.ansible.com/ansible/os_ironic_node_module.html" rel="noreferrer" target="_blank">http://docs.ansible.com/<wbr>ansible/os_ironic_node_module.<wbr>html</a><br>
<div class="gmail-HOEnZb"><div class="gmail-h5"></div></div></blockquote></div><br></div><div class="gmail_extra">Oh, that makes it easier. That just leaves the issue of 4 of the 5 scenarios in [2] not supporting config drive. The options I see here are:</div><div class="gmail_extra">a. nova forever</div><div class="gmail_extra">b. not support any boot from volume scenarios in TripleO that don't work with config-drive</div><div class="gmail_extra">c. write our own small metadata service (its basically serving machine specific static http content, so can maybe be done with some apache fu)</div><div class="gmail_extra"><br></div><div class="gmail_extra">If b. is acceptable then maybe I can un-abandon [3]?</div><div class="gmail_extra"><br></div><div class="gmail_extra">[2] <a href="http://specs.openstack.org/openstack/ironic-specs/specs/approved/boot-from-volume-reference-drivers.html">http://specs.openstack.org/openstack/ironic-specs/specs/approved/boot-from-volume-reference-drivers.html</a></div><div class="gmail_extra">[3] <a href="https://review.openstack.org/#/c/400407/">https://review.openstack.org/#/c/400407/</a></div></div>