Hi, A couple of weeks ago we had our partially virtual PTG. The full notes are in Etherpad [1], but I will try to summarise some of the discussions here. We had a good turnout, and good engagement in the discussions. # Priorities I'll start at the end, as this is arguably the most tangible outcome of the discussions. At the end of the sessions, we gathered all potential work items for Ussuri, and voted on them in another Etherpad [2]. Afterwards these were cleaned up, ordered, and assigned blueprints. The priorities were transferred to the whiteboard [3] for tracking. Of course other things will come along, and some things won't get done, but these priorities should be used as a guide for anyone reviewing or developing patches during the Ussuri cycle # General topics Sustainability is a key theme in kolla recently. There are two main aspects to this - keeping the community healthy, and ensuring the scope of the project is maintainable. We discussed having some additional community meetings on a regular basis, as a way to keep in touch with operators and to build up more of a community outside of IRC. I proposed having a virtual onboarding session which a number of people expressed an interest in attending. Pulling this session out of the summit should allow for a more level playing field. ## Ceph Ansible We are continuing to investigate Ceph Ansible as an alternative to our native Ceph deployment. The work to migrate from an existing kolla deployment is still ongoing. There are some potential blockers in the form of no Ubuntu container image support in ceph-ansible, and no ARM container images published by the ceph-container project. ## Kolla-cli This project has been restored in the Train cycle, and we continue to keep an eye on its progress. ## Extensibility We discussed making kolla and kolla-ansible more extensible as a way to avoid the need to support every service under the sun. This should be simple in kolla, but kolla-ansible will require more effort to define how custom playbooks or roles are hooked in. ## Inverting kolla images There was a proposal to use more upstream docker images, and potentially add our tooling on top where necessary. This is an interesting idea, but could add complexity with more image distros to track. ## Cloud native logging The cloud-native folk seem to have agreed on capturing container logs from stdout, which doesn't align with our file-based model. We also miss some logs during startup that don't get logged to files which could be interesting. We agreed to try ingesting these into fluentd as a starting point. ## Tracking bug fixes in release notes We agreed to start experimenting with tracking bug fixes in release notes from the Ussuri release. Previously we have just tracked features, deprecations and upgrade notes. This will raise the bar for contribution slightly, so we will keep an eye on it. # Kolla ## CentOS 8 Supporting CentOS 8 base container images is key for us this cycle, as it allows us to move to python 3 based CentOS images. These images may start to break at any moment as services drop python 2 support. The Ussuri release will not support CentOS 7 base container images. We are currently blocked by a number of missing yum repositories for CentOS 8. Supporting running kolla-build on a CentOS 8 host is a simpler task, although getting Docker installed requires a few contortions at this point. ## Drop python 2 This depends on CentOS 8 host and container support. We are heavily at risk of being broken as other projects drop python 2 support. We plan to keep CentOS CI happy for as long as possible but expect it may break at some point during the cycle. ## Zuul proposal bot We talked about adding a zuul proposal bot to update source package versions on stable branches (running tools/version-check.py). I looked into it and put forward a PoC, but we are going to try switching to a YAML format definition before proceeding with this. ## Remove EPEL Everyone likes to hate EPEL, so we will try to remove it. We also discussed an off by default approach for our custom package repositories to provide some damage limitation if they go AWOL. ## Support matrix We made a good start with the support matrix [4] this cycle. We'd like to continue this effort, and continue to evaluate which images we support. The next step seems to be to better define our categories of images, and use these to define voting vs. non-voting build failures. We'd like to find community owners for some of our 'community maintained' images. ## RabbitMQ upgrade RabbitMQ 3.8 brings a prometheus exporter, which a number of people have expressed an interest in. This will require an erlang upgrade. ## Prometheus 2 There is no migration path between prometheus 1 and 2. We discussed a few options for a smooth transition, and I think we landed on this: * keep old prometheus container around, configure it as a remote read source * configure haproxy to flip to the new prometheus when ready # Kolla Ansible ## CentOS 8 This is where it gets interesting. How do we migrate a running system from CentOS 7 to 8? Ideally we would not couple this to an OpenStack upgrade, so at least one release needs to support both CentOS7 and 8 hosts. I will follow up on this topic separately as it's a big one, and I'd like to try a cross-project approach. ## Drop python 2 The main interesting decision here is: don't drop py2 for remote hosts in Ussuri, until we are sure that Ussuri will only need to run against CentOS 8 hosts (see above). ## OVN support Neutron seems to be making moves to deprecate OVS and LinuxBridge ML2 drivers, replacing them with OVN in tree. We have OVN images, but no deployment support in kolla ansible. We'd like to add it this cycle. Interesting questions around migration from OVS to OVN came up. Tripleo has some tooling which might help here. ## More host-level commands (day 2 ops) We have the bootstrap-servers command for bootstrapping hosts, but lack some commands for ongoing operations. Common examples include: * reconfiguring or upgrading docker in a safe manner (without live-restore, a docker restart takes down your containers). * adding new hosts. This requires updating /etc/hosts everywhere, but running bootstrap-servers again is heavy handed and risks a docker restart. Containers don't automatically pick up changes to /etc/hosts, so we need to address that. * pruning docker images ## Restarting services There was a request for a command to restart services. It could probably be cobbled together from existing code quite easily. ## More destruction It should be possible to run the destroy command against a subset of services. We could also do more to thoroughly clean up. ## More security friendliness (especially transport security) * Could we integrate with letsencrypt? Possibly. * Should we default to use TLS with self-signed certs? Probably, but expiry could cause some surprises without explicit buy-in from the operator. * Can we use per-host RabbitMQ usernames and passwords? Potentially... ## SELinux Comes up often, but never gets voted for in our priorities. It would be nice to get this one sorted though. ## Fluentd reconfiguration Currently it's not possible to deploy the common services (cron, kolla-toolbox, fluentd) without also deploying another service. There are a few fiddly details, but it should be possible to resolve. ## Ansible lint We agreed to try running ansible-lint on our codebase. The group has had mixed results with it before, but was open to trying again. ## Ansible maximum version pinning We agreed to define a maximum version of Ansible that we support. This will help to prevent breakage out of our control. ## Nova cells v2 Work continues this cycle on cells with support for shared cell controllers, and deployment of multiple RabbitMQ and MariaDB clusters. ## Config file audit We should use the oslo config validator [5] to ensure our config is valid. ## Podman This one keeps coming up, but we never agree to implement it. Possible issues include a lack of a full-featured Python library, and lack of a supported package for Debian/Ubuntu. We agreed to start thinking about how we might perform a migration from Docker one day, given the direction of Red Hat. [1] https://etherpad.openstack.org/p/kolla-ussuri-ptg [2] https://etherpad.openstack.org/p/kolla-ussuri-priorities [3] https://etherpad.openstack.org/p/KollaWhiteBoard [4] https://docs.openstack.org/kolla/latest/support_matrix.html [5] https://docs.openstack.org/oslo.config/latest/cli/validator.html Well that turned into more of an exhaustive list than I'd expected. Well done for reading (or scrolling) to the end. Cheers, Mark
On 22/11/2019 17.47, Mark Goddard wrote:
[...] ## Ceph Ansible
We are continuing to investigate Ceph Ansible as an alternative to our native Ceph deployment. The work to migrate from an existing kolla deployment is still ongoing. There are some potential blockers in the form of no Ubuntu container image support in ceph-ansible, and no ARM container images published by the ceph-container project.
I suggest to talk with the Ceph community before spending more work here. ceph-ansible is getting replaced in March by "SSH orchestrator", see https://docs.google.com/presentation/d/1JpcETNXpuB1JEuhX_c8xtgNnv0gJhSaQUffM... Andreas -- Andreas Jaeger aj@suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Hi Andreas. Does it mean ceph-ansible will no longer be maintained? From your link it seems that you can activate a module ansible ...what does that mean exactly? Cheers On Fri, Nov 22, 2019 at 10:37 PM Andreas Jaeger <aj@suse.com> wrote:
On 22/11/2019 17.47, Mark Goddard wrote:
[...] ## Ceph Ansible
We are continuing to investigate Ceph Ansible as an alternative to our native Ceph deployment. The work to migrate from an existing kolla deployment is still ongoing. There are some potential blockers in the form of no Ubuntu container image support in ceph-ansible, and no ARM container images published by the ceph-container project.
I suggest to talk with the Ceph community before spending more work here. ceph-ansible is getting replaced in March by "SSH orchestrator", see
https://docs.google.com/presentation/d/1JpcETNXpuB1JEuhX_c8xtgNnv0gJhSaQUffM...
Andreas -- Andreas Jaeger aj@suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
-- *Alfredo*
On 24/11/2019 15.12, Alfredo De Luca wrote:
Hi Andreas. Does it mean ceph-ansible will no longer be maintained? From your link it seems that you can activate a module ansible ...what does that mean exactly?
I don't know all those details, you might want to reach out to the Ceph community, Andreas -- Andreas Jaeger aj@suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Thanks On Sun, Nov 24, 2019 at 7:59 PM Andreas Jaeger <aj@suse.com> wrote:
On 24/11/2019 15.12, Alfredo De Luca wrote:
Hi Andreas. Does it mean ceph-ansible will no longer be maintained? From your link it seems that you can activate a module ansible ...what does that mean exactly?
I don't know all those details, you might want to reach out to the Ceph community,
Andreas -- Andreas Jaeger aj@suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
-- *Alfredo*
On Fri, 22 Nov 2019 at 21:30, Andreas Jaeger <aj@suse.com> wrote:
On 22/11/2019 17.47, Mark Goddard wrote:
[...] ## Ceph Ansible
We are continuing to investigate Ceph Ansible as an alternative to our native Ceph deployment. The work to migrate from an existing kolla deployment is still ongoing. There are some potential blockers in the form of no Ubuntu container image support in ceph-ansible, and no ARM container images published by the ceph-container project.
I suggest to talk with the Ceph community before spending more work here. ceph-ansible is getting replaced in March by "SSH orchestrator", see
https://docs.google.com/presentation/d/1JpcETNXpuB1JEuhX_c8xtgNnv0gJhSaQUffM...
Thanks for bringing this up Andreas, we'll reevaluate our approach here.
Andreas -- Andreas Jaeger aj@suse.com Twitter: jaegerandi SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, D 90409 Nürnberg (HRB 36809, AG Nürnberg) GF: Felix Imendörffer GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
participants (3)
-
Alfredo De Luca
-
Andreas Jaeger
-
Mark Goddard