Open Stack

Mon Jul 10 13:19:53 UTC 2017

On Fri, Jul 7, 2017 at 6:50 PM, James Slagle <james.slagle at gmail.com> wrote:
> I proposed a session for the PTG
> (https://etherpad.openstack.org/p/tripleo-ptg-queens) about forming a
> common plan and vision around Ansible in TripleO.
>
> I think it's important however that we kick this discussion off more
> broadly before the PTG, so that we can hopefully have some agreement
> for deeper discussions and prototyping when we actually meet in
> person.

Thanks for starting this James, it's a topic that I've also been
giving quite a lot of thought to lately (and as you've seen, have
pushed some related patches) so it's good to get some broader
discussions going.

> Right now, we have multiple uses of Ansible in TripleO:
>
> (0) tripleo-quickstart which follows the common and well accepted
> approach to bundling a set of Ansible playbooks/roles.

FWIW I agree with Giulio that quickstart is a separate case, and while
I also do agree with David that there's plenty of scope for
improvement of the oooq user experience, but I'm going to focus on the
TripleO deployment aspects below.

> (1) Mistral calling Ansible. This is the approach used by
> tripleo-validations where Mistral directly executes ansible playbooks
> using a dynamic inventory. The inventory is constructed from the
> server related stack outputs of the overcloud stack.
>
> (2) Ansible running playbooks against localhost triggered by the
> heat-config Ansible hook. This approach is used by
> tripleo-heat-templates for upgrade tasks and various tasks for
> deploying containers.
>
> (3) Mistral calling Heat calling Mistral calling Ansible. In this
> approach, we have Mistral resources in tripleo-heat-templates that are
> created as part of the overcloud stack and in turn, the created
> Mistral action executions run ansible. This has been prototyped with
> using ceph-ansible to install Ceph as part of the overcloud
> deployment, and some of the work has already landed. There are also
> proposed WIP patches using this approach to install Kubernetes.
>
> There are also some ideas forming around pulling the Ansible playbooks
> and vars out of Heat so that they can be rerun (or run initially)
> independently from the Heat SoftwareDeployment delivery mechanism:
>
> (4) https://review.openstack.org/#/c/454816/
>
> (5) Another idea I'd like to prototype is a local tool that runs on
> the undercloud and pulls all of the SoftwareDeployment data out of
> Heat as the stack is being created and generates corresponding Ansible
> playbooks to apply those deployments. Once a given playbook is
> generated by the tool, the tool would signal back to Heat that the
> deployment is complete. Heat then creates the whole stack without
> actually applying a single deployment to an overcloud node. At that
> point, Ansible (or Mistral->Ansible for an API) would be used to do
> the actual deployment of the Overcloud with the Undercloud as the
> ansible runner.

Yeah so my idea with (4), and subsequent patches such as[1] is to
gradually move the deploy steps performed to configure services (on
baremetal and in containers) to a single ansible playbook.

There's currently still heat orchestration around the host preparation
(although this is performed via ansible) and iteration over each step
(where we re-apply the same deploy-steps playbook with an incrementing
step variable, but this could be replaced by e.g an ansible or mistral
loop), but my idea was to enable end-to-end configuration of nodes via
ansible-playbook, without the need for any special tooks (e.g we
refactor t-h-t enough that we don't need any special tools, and we
make deploy-steps-playbook.yaml the only method of deployment (for
baremetal and container cases)

[1] https://review.openstack.org/#/c/462211/

> All of this work has merit as we investigate longer term plans, and
> it's all at different stages with some being for dev/CI (0), some
> being used already in production (1 and 2), some just at the
> experimental stage (3 and 4), and some does not exist other than an
> idea (5).

I'd like to get the remaining work for (4) done so it's a supportable
option for minor updates, but there's still a bit more t-h-t
refactoring required to enable it I think, but I think we're already
pretty close to being able to run end-to-end ansible for most of the
PostDeploy steps without any special tooling.

Note this related patch from Matthieu:

https://review.openstack.org/#/c/444224/

I think we'll need to go further here but it's a starting point which
shows how we could expose ansible tasks from the heat stack outputs as
a first step to enabling standalone configuration via ansible (or
mistral->ansible)

> My intent with this mail is to start a discussion around what we've
> learned from these approaches and start discussing a consolidated plan
> around Ansible. And I'm not saying that whatever we come up with
> should only use Ansible a certain way. Just that we ought to look at
> how users/operators interact with Ansible and TripleO today and try
> and come up with the best solution(s) going forward.
>
> I think that (1) has been pretty successful, and my idea with (5)
> would use a similar approach once the playbooks were generated.
> Further, my idea with (5) would give us a fully backwards compatible
> solution with our existing template interfaces from
> tripleo-heat-templates. Longer term (or even in parallel for some
> time), the generated playbooks could stop being generated (and just
> exist in git), and we could consider moving away from Heat more
> permanently

Yeah I think working towards aligning more TripleO configuration with
the approach taken by tripleo-validations is fine, and we can e.g add
more heat generated data about the nodes to the dynamic ansible
inventory:

https://github.com/openstack/tripleo-validations/blob/master/tripleo_validations/inventory.py

We've been gradually adding data there, which I hope will enable a
cleaner "split stack", where the nodes are deployed via heat, then
ansible can do the configuration based on data exposed via stack
outputs (which again is a pattern that I think has been proven to work
quite well for tripleo-validations, and is also something I've been
using locally for dev testing quite successfully).

> I recognize that saying "moving away from Heat" may be quite
> controversial. While it's not 100% the same discussion as what we are
> doing with Ansible, I think it is a big part of the discussion and if
> we want to continue with Heat as the primary orchestration tool in
> TripleO.

Yeah, I think the first step is to focus on a clean "split stack"
model where the nodes/networks etc are still deployed via heat, then
ansible handles the configuration of the nodes.

In the long term I could see benefits in a "tripleo lite" model,
where, say, we only used mistral+Ironic+ansible, but IMO we're not at
the point yet where that's achievable, primarily because there's
coupling between the heat parameter interfaces and multiple
integrations we can't break (e.g users with environment files,
tripleo-ui, vendor integrations, etc).

It's a good discussion to kick off regardless though, so personally
I'd like to focus on these as the first "baby steps":

1. How to perform end-to-end configuration via ansible (outside of
heat, but probably still using data and possibly playbooks generated
by heat)

2. How to deploy nodes directly via Ironic, with a mistral workflow
(e.g no Nova and potentially no Neutron?), I started that in
https://review.openstack.org/#/c/313048/ but could use some help
completing it.

> I've been hearing a lot of feedback from various operators about how
> difficult the baremetal deployment is with Heat. While feedback about
> Ironic is generally positive, a lot of the negative feedback is around
> the Heat->Nova->Ironic interaction. And, if we also move more towards
> Ansible for the service deployment, I wonder if there is still a long
> term place for Heat at all.

So while there are plenty of valid complaints, one observation is Heat
always gets blamed because it's the operator visible interface, but
quite often the problems are e.g Nova or some other non-heat issue,
for example "No valid host found" is often perceived a heat problem by
new users when in reality it's not.

That said, there are valid complaints around the SoftwareDeployment
approach and operator familiarity vs some more traditional tool such
as ansible.

> Personally, I'm pretty apprehensive about the approach taken in (3). I
> feel that it is a lot of complexity that could be done simpler if we
> took a step back and thought more about a longer term approach. I
> recognize that it's mostly an experiment/POC at this stage, and I'm
> not trying to directly knock down the approach. It's just that when I
> start to see more patches (Kubernetes installation) using the same
> approach, I figure it's worth discussing more broadly vs trying to
> have a discussion by -1'ing patch reviews, etc.

I agree, I think the approach in (3) is a stopgap until we can define
a cleaner approach with less layers.

IMO the first step towards that is likely to be a "split stack" which
outputs heat data, then deployment configuration is performed via
mistral->ansible just like we already do in (1).

> I'm interested in all feedback of course. And I plan to take a shot at
> working on the prototype I mentioned in (5) if anyone would like to
> collaborate around that.

I'm very happy to collaborate, and this is quite closely related to
the investigations I've been doing around enabling minor updates for
containers.

Lets sync up about it, but as I mentioned above I'm not yet fully sold
on a new translation tool, vs just more t-h-t refactoring to enable
output of data directly consumable via ansible-playbook (which can
then be run via operators, or heat, or mistral, or whatever).

> I think if we can form some broad agreement before the PTG, we have a
> chance at making some meaningful progress during Queens.

Agreed, although we probably do need to make some more progress on
some aspects of this for container minor updates that we'll need for
Pike.

Thanks,

Steve

Open Stack

[openstack-dev] [TripleO] Forming our plans around Ansible

OpenStack

Community

Documentation

Branding & Legal