Hi, Thanks to everyone who joined the virtual PTG sessions this week. We had some great discussions covering many different topics. I will try to provide a summary here. # Victoria cycle retrospective The Ubuntu Focal migration went smoothly. Core reviewer bandwidth was hurt by the pandemic, but we did pick up towards the end of the cycle and landed a good number of features. As usual, lots of CI fires to extinguish. # Review Victoria PTG Actions We looked back at the actions from the last PTG. Quite a few are still incomplete - we should do a better job of reviewing these. We agreed that the docs-related actions are still the most important. Any help with improving our documentation is appreciated. # General topics ## Ability to run CI jobs locally (without Zuul, but possibly with Ansible) This depends on a Zuul feature called Zuul Runner [1], which should be ready soon. yoctozepto signed up to test it out and document how to use it with Kolla. ## Future of Kolla Klub and Kall The Kolla Klub started well, with lots of involvement, and some good sessions. Over time, attendance has fallen, and the agenda still seems to be largely driven by me. Keeping up productive sessions every two weeks is starting to feel like a bit of a chore, and I think we need to change things up or stop. We agreed to try rebooting at a time that better suits contributors in APAC, possibly recycling some earlier material. There was a proposal at the summit to have monthly ops meetups, and maybe we would do well to link these related communities. We had a few ideas for how to improve the productivity of the more development oriented Kolla Kall, including multiple smaller groups, pair/mob programming, and fewer longer sessions. ## Keep or drop lower-constraints jobs? We didn't find many reasons to keep the lower-constraints jobs, and they do seem to need a bit of maintenance. We voted 2 in favour to remove them, with everyone else not having strong feelings. ## How can we integrate better with our user & contributor base outside of EU/US timezones? Kolla has a large user base outside of EU/US timezones, but the core team and surrounding community is not always that effective at involving these users. We agreed to try to improve the situation by reaching out to known community members in other regions, and offering some limited mentorship. # Kolla ## Infra images This feature has been in progress for a few cycles, but is proving difficult to land. The main blocker now is in CI. We need to split our image build CI jobs into several pipelines. They need to share a local docker registry. This pattern should open up a few other interesting features. ## New Docker Hub limits Docker hub is changing its policy for anonymous and free users. This affects retention of unused images, and limits the number of pulls in a 6 hour period (100 for anonymous users, limited by IP, 200 for logged in free users). This is likely to affect users running without a local registry. Fl1nt agreed to add some information to our documentation to help users. This is likely also to impact our CI jobs, which need to pull images. There is an application process to be listed as an open source project, which we need to ask the opendev infra team to apply for. # Kolla Ansible ## More fine grained skipping of tasks, e.g. allow to skip service registration priteau proposed making it easier to skip certain tasks during deployment, such as bootstrapping or service registration. There are two obvious ways to do this - a new 'action' (e.g. deploy-without-register), or add tags to the relevant tasks, which may be skipped. A third approach is to combine two commands - genconfig & deploy-containers. This is related to work currently going to allow configuration to be generated separately from container deployment. In the end, we considered adding tags, as well as the two command approach. ## Deprecate reconfigure command? The reconfigure command is now essentially the same as deploy. This is a source of confusion for users, so we agreed to deprecate reconfigure. We will keep reconfigure as an alias for deploy at the CLI level. ## Modernise the old skool Swift role The Swift role is the last (except Bifrost, which is generally weird), to use the old format, prior to the better-reconfigure blueprint. This has some limitations, so we should bring it into line. Nothing controversial here, just needs someone to pick it up. ## DNS-based endpoint naming rafaelweingartne has proposed a mini spec [2] for DNS-based endpoint naming. I think we now have a reasonable idea of his solution (which is to use Consul for service discovery, and avoid HAProxy for internal API communication), however the hard part is defining the problem that is being solved. ## Support to identity federation (OpenID Connect) configurations in Keystone and Horizon via Kolla-ansible There is a patch [3] to support this which has been open for some time. It requires some fairly specific knowledge to review, and the challenge will be in making core reviewers confident that it works. We agreed that CI testing and/or documentation of a simple development environment would help. ## podman Podman keeps coming up as a replacement for Docker. There are several issues here, including support on Debian/Ubuntu - we do not want to maintain support for two container engines. We agreed that this is likely to be a large feature, and that we should get started with a proof of concept in Wallaby, possibly with a spec to tease out the details. Fl1nt may have some time early 2021 for this, but anyone is welcome to pick it up. ## Finish Masakari integration We don't have support for the host monitor, which is arguably the most useful part of Masakari. With yoctozepto as Masakari PTL, we expect to see good progress in Wallaby. ## Performance improvements We made some good progress with performance improvement in Victoria. There is always more that can be done, and the next phase will probably involve splitting roles that target both control and compute hosts. Mitogen [4] is also an interesting option. As always, changes should be based on benchmarks [5]. ## Let's Encrypt integration - container running certbot, triggers certificate distribution yoctozepto added initial support for the ACME protocol in Ussuri, and the next step is to support deployment of a certbot container, and unattended certificate rotation. There are lots of fiddly details here, including how to bootstrap it. headphoneJames offered to write up a high level description of how the pieces fit together. ## Support for HAProxy reload (e.g. for TLS) Currently any configuration changes to HAProxy require a restart to be applied. This causes connections to be dropped. We discussed adding support for hitless reloads of HAProxy [6]. ## Native fluentd logging We didn't discuss this as kevko was busy, but I thought I'd add it here anyway. The idea is to use use logging.conf to configure services for native fluentd logging. This avoids all the regexes in fluentd, but may change our document schema in Elasticsearch. Patch [7] is up for review. # Kayobe ## Support multiple environments from a single kayobe configuration This feature has been on the cards for some time, and is an important step for Kayobe. priteau has proposed a patch [8] which is a good starting point. We spent a good hour going over some of the outstanding issues, and made some good progress. We agreed on the need to keep the meaning of KAYOBE_CONFIG_PATH intact, and to allow a branch to define the default environment to use. We agreed to write a high level design document to pull out the remaining uncertainties, and aim to get it over the line in Wallaby. ## Switching Docker storage from devicemapper to overlay2 We switched the default Docker storage driver from devicemapper to overlay2 in Victoria. For existing users of devicemapper, the plan is to try to ensure they are migrated to overlay2 during CentOS 8 migration. Further, we will deprecate devicemapper on stable branches, and require users to take some action (set a variable) to allow its use. This should deter new users from using devicemapper, without completely breaking existing users. # Switch to NetworkManager? jovial proposed this discussion, as he likes the checkpoint/restore feature of NetworkManager. There are various ways to interface with NM, including nm-cli, an ansible module, templated config files + nmcli con reload. ## Strip out Grafana post configure functionality and move it to Kolla-Ansible dougszu proposed this topic, to improve the separation of concerns between Kayobe and Kolla Ansible. It may also apply to setup we do for ironic. ## Create a reference custom playbook repo We have a stackhpc repo [9] with various useful playbooks and scripts. dougszu proposed making something more official. I suggested that Ansible collections could be used to improve the user experience with custom playbooks in external repos. ## Config gen on ansible control host This is something we've talked about for a while - offline generation of configuration, and potentially validating it in a CI pipeline. jovial noted the importance of being able to redact passwords. This is mostly dependent on ongoing changes to Kolla Ansible, but could be wrapped up nicely by Kayobe. ## Hashivault integration While we have integrated with Hashicorp vault for secret retrieval, the user experience could be better. We also lack support for pushing passwords generated by kolla up to Vault. This is likely to be a Kolla Ansible feature, plus some Kayobe integration. [1] https://review.opendev.org/681277 [2] https://review.opendev.org/759706 [3] https://review.opendev.org/695432 [4] https://github.com/dw/mitogen [5] https://github.com/stackhpc/ansible-scaling [6] https://www.haproxy.com/blog/hitless-reloads-with-haproxy-howto/ [7] https://review.opendev.org/755775 [8] https://review.opendev.org/734867 [9] https://github.com/stackhpc/kayobe-ops Cheers, Mark