[kolla] PTG summary

Mark Goddard mark at stackhpc.com
Fri Nov 22 16:47:32 UTC 2019


Hi,

A couple of weeks ago we had our partially virtual PTG. The full notes
are in Etherpad [1], but I will try to summarise some of the
discussions here. We had a good turnout, and good engagement in the
discussions.

# Priorities

I'll start at the end, as this is arguably the most tangible outcome
of the discussions. At the end of the sessions, we gathered all
potential work items for Ussuri, and voted on them in another Etherpad
[2]. Afterwards these were cleaned up, ordered, and assigned
blueprints. The priorities were transferred to the whiteboard [3] for
tracking.

Of course other things will come along, and some things won't get
done, but these priorities should be used as a guide for anyone
reviewing or developing patches during the Ussuri cycle

# General topics

Sustainability is a key theme in kolla recently. There are two main
aspects to this - keeping the community healthy, and ensuring the
scope of the project is maintainable.

We discussed having some additional community meetings on a regular
basis, as a way to keep in touch with operators and to build up more
of a community outside of IRC. I proposed having a virtual onboarding
session which a number of people expressed an interest in attending.
Pulling this session out of the summit should allow for a more level
playing field.

## Ceph Ansible

We are continuing to investigate Ceph Ansible as an alternative to our
native Ceph deployment. The work to migrate from an existing kolla
deployment is still ongoing. There are some potential blockers in the
form of no Ubuntu container image support in ceph-ansible, and no ARM
container images published by the ceph-container project.

## Kolla-cli

This project has been restored in the Train cycle, and we continue to
keep an eye on its progress.

## Extensibility

We discussed making kolla and kolla-ansible more extensible as a way
to avoid the need to support every service under the sun. This should
be simple in kolla, but kolla-ansible will require more effort to
define how custom playbooks or roles are hooked in.

## Inverting kolla images

There was a proposal to use more upstream docker images, and
potentially add our tooling on top where necessary. This is an
interesting idea, but could add complexity with more image distros to
track.

## Cloud native logging

The cloud-native folk seem to have agreed on capturing container logs
from stdout, which doesn't align with our file-based model. We also
miss some logs during startup that don't get logged to files which
could be interesting. We agreed to try ingesting these into fluentd as
a starting point.

## Tracking bug fixes in release notes

We agreed to start experimenting with tracking bug fixes in release
notes from the Ussuri release. Previously we have just tracked
features, deprecations and upgrade notes. This will raise the bar for
contribution slightly, so we will keep an eye on it.

# Kolla

## CentOS 8

Supporting CentOS 8 base container images is key for us this cycle, as
it allows us to move to python 3 based CentOS images. These images may
start to break at any moment as services drop python 2 support. The
Ussuri release will not support CentOS 7 base container images. We are
currently blocked by a number of missing yum repositories for CentOS
8.

Supporting running kolla-build on a CentOS 8 host is a simpler task,
although getting Docker installed requires a few contortions at this
point.

## Drop python 2

This depends on CentOS 8 host and container support. We are heavily at
risk of being broken as other projects drop python 2 support. We plan
to keep CentOS CI happy for as long as possible but expect it may
break at some point during the cycle.

## Zuul proposal bot

We talked about adding a zuul proposal bot to update source package
versions on stable branches (running tools/version-check.py). I looked
into it and put forward a PoC, but we are going to try switching to a
YAML format definition before proceeding with this.

## Remove EPEL

Everyone likes to hate EPEL, so we will try to remove it. We also
discussed an off by default approach for our custom package
repositories to provide some damage limitation if they go AWOL.

## Support matrix

We made a good start with the support matrix [4] this cycle. We'd like
to continue this effort, and continue to evaluate which images we
support. The next step seems to be to better define our categories of
images, and use these to define voting vs. non-voting build failures.
We'd like to find community owners for some of our 'community
maintained' images.

## RabbitMQ upgrade

RabbitMQ 3.8 brings a prometheus exporter, which a number of people
have expressed an interest in. This will require an erlang upgrade.

## Prometheus 2

There is no migration path between prometheus 1 and 2. We discussed a
few options for a smooth transition, and I think we landed on this:

* keep old prometheus container around, configure it as a remote read source
* configure haproxy to flip to the new prometheus when ready

# Kolla Ansible

## CentOS 8

This is where it gets interesting. How do we migrate a running system
from CentOS 7 to 8? Ideally we would not couple this to an OpenStack
upgrade, so at least one release needs to support both CentOS7 and 8
hosts. I will follow up on this topic separately as it's a big one,
and I'd like to try a cross-project approach.

## Drop python 2

The main interesting decision here is: don't drop py2 for remote hosts
in Ussuri, until we are sure that Ussuri will only need to run against
CentOS 8 hosts (see above).

## OVN support

Neutron seems to be making moves to deprecate OVS and LinuxBridge ML2
drivers, replacing them with OVN in tree. We have OVN images, but no
deployment support in kolla ansible. We'd like to add it this cycle.
Interesting questions around migration from OVS to OVN came up.
Tripleo has some tooling which might help here.

## More host-level commands (day 2 ops)

We have the bootstrap-servers command for bootstrapping hosts, but
lack some commands for ongoing operations. Common examples include:

* reconfiguring or upgrading docker in a safe manner (without
live-restore, a docker restart takes down your containers).
* adding new hosts. This requires updating /etc/hosts everywhere, but
running bootstrap-servers again is heavy handed and risks a docker
restart. Containers don't automatically pick up changes to /etc/hosts,
so we need to address that.
* pruning docker images

## Restarting services

There was a request for a command to restart services. It could
probably be cobbled together from existing code quite easily.

## More destruction

It should be possible to run the destroy command against a subset of
services. We could also do more to thoroughly clean up.

## More security friendliness (especially transport security)

* Could we integrate with letsencrypt? Possibly.
* Should we default to use TLS with self-signed certs? Probably, but
expiry could cause some surprises without explicit buy-in from the
operator.
* Can we use per-host RabbitMQ usernames and passwords? Potentially...

## SELinux

Comes up often, but never gets voted for in our priorities. It would
be nice to get this one sorted though.

## Fluentd reconfiguration

Currently it's not possible to deploy the common services (cron,
kolla-toolbox, fluentd) without also deploying another service. There
are a few fiddly details, but it should be possible to resolve.

## Ansible lint

We agreed to try running ansible-lint on our codebase. The group has
had mixed results with it before, but was open to trying again.

## Ansible maximum version pinning

We agreed to define a maximum version of Ansible that we support. This
will help to prevent breakage out of our control.

## Nova cells v2

Work continues this cycle on cells with support for shared cell
controllers, and deployment of multiple RabbitMQ and MariaDB clusters.

## Config file audit

We should use the oslo config validator [5] to ensure our config is valid.

## Podman

This one keeps coming up, but we never agree to implement it. Possible
issues include a lack of a full-featured Python library, and lack of a
supported package for Debian/Ubuntu. We agreed to start thinking about
how we might perform a migration from Docker one day, given the
direction of Red Hat.

[1] https://etherpad.openstack.org/p/kolla-ussuri-ptg
[2] https://etherpad.openstack.org/p/kolla-ussuri-priorities
[3] https://etherpad.openstack.org/p/KollaWhiteBoard
[4] https://docs.openstack.org/kolla/latest/support_matrix.html
[5] https://docs.openstack.org/oslo.config/latest/cli/validator.html

Well that turned into more of an exhaustive list than I'd expected.
Well done for reading (or scrolling) to the end.

Cheers,
Mark



More information about the openstack-discuss mailing list