[openstack-dev] [TripleO] Forming our plans around Ansible

Dmitry Tantsur dtantsur at redhat.com
Mon Jul 17 13:06:42 UTC 2017


On 07/12/2017 04:18 AM, Steve Baker wrote:
> 
> 
> On Wed, Jul 12, 2017 at 11:47 AM, James Slagle <james.slagle at gmail.com 
> <mailto:james.slagle at gmail.com>> wrote:
> 
>     On Tue, Jul 11, 2017 at 6:53 PM, Steve Baker <sbaker at redhat.com
>     <mailto:sbaker at redhat.com>> wrote:
>      >
>      >
>      > On Tue, Jul 11, 2017 at 6:51 AM, James Slagle <james.slagle at gmail.com
>     <mailto:james.slagle at gmail.com>>
>      > wrote:
>      >>
>      >> On Mon, Jul 10, 2017 at 11:37 AM, Lars Kellogg-Stedman <lars at redhat.com
>     <mailto:lars at redhat.com>>
>      >> wrote:
>      >> > On Fri, Jul 7, 2017 at 1:50 PM, James Slagle <james.slagle at gmail.com
>     <mailto:james.slagle at gmail.com>>
>      >> > wrote:
>      >> >>
>      >> >> There are also some ideas forming around pulling the Ansible playbooks
>      >> >>
>      >> >> and vars out of Heat so that they can be rerun (or run initially)
>      >> >> independently from the Heat SoftwareDeployment delivery mechanism:
>      >> >
>      >> >
>      >> > I think the closer we can come to "the operator runs ansible-playbook to
>      >> > configure the overcloud" the better, but not because I think Ansible is
>      >> > inherently a great tool: rather, I think the many layers of indirection
>      >> > in
>      >> > our existing model make error reporting and diagnosis much more
>      >> > complicated
>      >> > that it needs to be.  Combined with Puppet's "fail as late as possible"
>      >> > model, this means that (a) operators waste time waiting for a deployment
>      >> > that is ultimately going to fail but hasn't yet, and (b) when it does
>      >> > fail,
>      >> > they need relatively intimate knowledge of our deployment tools to
>      >> > backtrack
>      >> > through logs and find the root cause of the failure.
>      >> >
>      >> > If we can offer a deployment mode that reduces the number of layers
>      >> > between
>      >> > the operator and the actions being performed on the hosts I think we
>      >> > would
>      >> > win on both fronts: faster failures and reporting errors as close as
>      >> > possible to the actual problem will result in less frustration across
>      >> > the
>      >> > board.
>      >> >
>      >> > I do like Steve's suggestion of a split model where Heat is responsible
>      >> > for
>      >> > instantiating OpenStack resources while Ansible is used to perform host
>      >> > configuration tasks.  Despite all the work done on Ansible's OpenStack
>      >> > modules, they feel inflexible and frustrating to work with when compared
>      >> > to
>      >> > Heat's state-aware, dependency ordered deployments.  A solution that
>      >> > allows
>      >> > Heat to output configuration that can subsequently be consumed by
>      >> > Ansible --
>      >> > either running manually or perhaps via Mistral for
>      >> > API-driven-deployments --
>      >> > seems like an excellent goal.  Using Heat as a "front-end" to the
>      >> > process
>      >> > means that we get to keep the parameter validation and documentation
>      >> > that is
>      >> > missing in Ansible, while still following the Unix philosophy of giving
>      >> > you
>      >> > enough rope to hang yourself if you really want it.
>      >>
>      >> This is excellent input, thanks for providing it.
>      >>
>      >> I think it lends itself towards suggesting that we may like to persue
>      >> (again) adding native Ironic resources to Heat. If those were written
>      >> in a way that also addressed some of the feedback about TripleO and
>      >> the baremetal deployment side, then we could continue to get the
>      >> advantages from Heat that you mention.
>      >>
>      >> My personal opinion to date is that Ansible's os_ironic* modules are
>      >> superior in some ways to the Heat->Nova->Ironic model. However, just a
>      >> Heat->Ironic model may work in a way that has the advantages of both.
>      >
>      >
>      > I too would dearly like to get nova out of the picture. Our placement needs
>      > mean the scheduler is something we need to work around, and it discards
>      > basically all context for the operator when ironic can't deploy for some
>      > reason.
>      >
>      > Whether we use a mistral workflow[1], a heat resource, or ansible os_ironic,
>      > there will still need to be some python logic to build the config drive ISO
>      > that injects the ssh keys and os-collect-config bootstrap.
>      >
>      > Unfortunately ironic iPXE boot from iSCSI[2] doesn't support config-drive
>      > (still?) so the only option to inject ssh keys is the nova ec2-metadata
>      > service (or equivalent). I suspect if we can't make every ironic deployment
>      > method support config-drive then we're stuck with nova.
>      >
>      > I don't have a strong preference for a heat resource vs mistral vs ansible
>      > os_ironic, but given there is some python logic required anyway, I would
>      > lean towards a heat resource. If the resource is general enough we could
>      > propose it to heat upstream, otherwise we could carry it in tripleo-common.
>      >
>      > Alternatively, we can implement a config-drive builder in tripleo-common and
>      > invoke that from mistral or ansible.
> 
>     Ironic's cli node-set-provision-state command has a --config-drive
>     option where you just point it a directory and it will automatically
>     bundle that dir into the config drive ISO format.
> 
>     Ansible's os_ironic_node[1] also supports that via the config_drive
>     parameter. Combining that with a couple of template tasks to create
>     meta_data.json and user_data files makes for a very easy to user
>     interface.
> 
> 
>     [1] http://docs.ansible.com/ansible/os_ironic_node_module.html
>     <http://docs.ansible.com/ansible/os_ironic_node_module.html>
> 
> 
> Oh, that makes it easier. That just leaves the issue of 4 of the 5 scenarios in 
> [2] not supporting config drive. The options I see here are:
> a. nova forever
> b. not support any boot from volume scenarios in TripleO that don't work with 
> config-drive
> c. write our own small metadata service (its basically serving machine specific 
> static http content, so can maybe be done with some apache fu)

Note that the Ironic team is currently not planning to continue the work on 
boot-from-volume in the *undercloud*. So this issue is not relevant so far.

> 
> If b. is acceptable then maybe I can un-abandon [3]?
> 
> [2] 
> http://specs.openstack.org/openstack/ironic-specs/specs/approved/boot-from-volume-reference-drivers.html
> [3] https://review.openstack.org/#/c/400407/
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 




More information about the OpenStack-dev mailing list