Open Stack

Tue Apr 10 09:44:31 UTC 2018

On 4 April 2018 at 03:33, David Moreau Simard <dmsimard at redhat.com> wrote:

> It won't be very exciting but we really need to do one of the
> following two things soon:
>
> 1) Ansiblify control plane [1]
> 2) Update our puppet things to puppet 4 (or 5?)
>
> Puppet 3 has been end of life since Dec 31, 2016. [2]
>
> The longer we draw this out, the more work it'll be :(
>
> [1]: https://review.openstack.org/#/c/469983/
> [2]: https://groups.google.com/forum/#!topic/puppet-users/IdutL5FTW7w
>
>
> David Moreau Simard
> Senior Software Engineer | OpenStack RDO
>
> dmsimard = [irc, github, twitter]
>
>

I would suggest that whether it's decided to switch to ansible for the
control plane or update puppet modules, it will be well worth investing
thought into performance when running across nodes that contain "different"
services to perform "different functions".

Ansible is very very good at running the same task across multiple
machines, e.g. configuring homogeneous servers. But control planes have a
tendency to have a lot of different services running on subsets, and this
has a consequence of resulting in lots of time spent waiting on tasks to
complete on some nodes and skip on the rest due to the synchronization of
tasks across the entire set.

When working on the precursor to https://github.com/ArdanaCLM (original was
used as part of Helion OpenStack by HP(E)) we had a CI job testing the
deployment of a small control plane and some services on a set of 6 VMs and
the time cost was prohibitive at 1.5hrs ~ 2.5hrs (upgrade testing CI was
double these figures). A lot of the time 50% or more of VMs were idle
because tasks that involved a few nodes meant nothing else could be done on
the others.

There were some thoughts around adding a strategy plugin to ansible that
could do a cross between the free-run and synchronized behaviour where you
could free run to completion on nodes unless you encountered certain tasks.
Other alternatives included nest ansible runs to have free runs done to a
point before then performing the tasks that involved cluster style
operations in synchronization or careful crafting of the playbooks to
achieve the same. Never got around to solving these, and some of the
problems were caused by us adopting an approach without necessarily having
a deep understanding of the tooling.

None of this is to say the same problems will exist here, but when you are
managing systems/services that interact, and it's difficult to CI them in
isolation at the project, potentially you'll want some way for developers
and CI on changes to exercise a test env.

The cost of developing/testing/integrating with either approach should
probably be investigated for both in detail. Before you look at whether
it's easy to replace the puppet modules with ansible or update to puppet
4/5, so it might be worth focusing on what approaches might be needed to
extract the best experience first (stability, ease of writing/maintenance &
speed of dev-env bring up come to mind as important)?

Past experience with any config management suggests that when you start
simple it's easy to incrementally improve on the existing approach, but
reserving direction when you hit dead ends is almost impossible ;)

-- 
Darragh Bailey
"Nothing is foolproof to a sufficiently talented fool"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20180410/92192e0a/attachment-0001.html>

Open Stack

[OpenStack-Infra] Selecting New Priority Effort(s)

OpenStack

Community

Documentation

Branding & Legal