[kolla] Wallaby PTG summary
Mark Goddard
mark at stackhpc.com
Fri Oct 30 19:28:15 UTC 2020
Hi,
Thanks to everyone who joined the virtual PTG sessions this week. We
had some great discussions covering many different topics. I will try
to provide a summary here.
# Victoria cycle retrospective
The Ubuntu Focal migration went smoothly. Core reviewer bandwidth was
hurt by the pandemic, but we did pick up towards the end of the cycle
and landed a good number of features. As usual, lots of CI fires to
extinguish.
# Review Victoria PTG Actions
We looked back at the actions from the last PTG. Quite a few are still
incomplete - we should do a better job of reviewing these. We agreed
that the docs-related actions are still the most important. Any help
with improving our documentation is appreciated.
# General topics
## Ability to run CI jobs locally (without Zuul, but possibly with Ansible)
This depends on a Zuul feature called Zuul Runner [1], which should be
ready soon. yoctozepto signed up to test it out and document how to
use it with Kolla.
## Future of Kolla Klub and Kall
The Kolla Klub started well, with lots of involvement, and some good
sessions. Over time, attendance has fallen, and the agenda still seems
to be largely driven by me. Keeping up productive sessions every two
weeks is starting to feel like a bit of a chore, and I think we need
to change things up or stop. We agreed to try rebooting at a time that
better suits contributors in APAC, possibly recycling some earlier
material. There was a proposal at the summit to have monthly ops
meetups, and maybe we would do well to link these related communities.
We had a few ideas for how to improve the productivity of the more
development oriented Kolla Kall, including multiple smaller groups,
pair/mob programming, and fewer longer sessions.
## Keep or drop lower-constraints jobs?
We didn't find many reasons to keep the lower-constraints jobs, and
they do seem to need a bit of maintenance. We voted 2 in favour to
remove them, with everyone else not having strong feelings.
## How can we integrate better with our user & contributor base
outside of EU/US timezones?
Kolla has a large user base outside of EU/US timezones, but the core
team and surrounding community is not always that effective at
involving these users. We agreed to try to improve the situation by
reaching out to known community members in other regions, and offering
some limited mentorship.
# Kolla
## Infra images
This feature has been in progress for a few cycles, but is proving
difficult to land. The main blocker now is in CI. We need to split our
image build CI jobs into several pipelines. They need to share a local
docker registry. This pattern should open up a few other interesting
features.
## New Docker Hub limits
Docker hub is changing its policy for anonymous and free users. This
affects retention of unused images, and limits the number of pulls in
a 6 hour period (100 for anonymous users, limited by IP, 200 for
logged in free users). This is likely to affect users running without
a local registry. Fl1nt agreed to add some information to our
documentation to help users. This is likely also to impact our CI
jobs, which need to pull images. There is an application process to be
listed as an open source project, which we need to ask the opendev
infra team to apply for.
# Kolla Ansible
## More fine grained skipping of tasks, e.g. allow to skip service registration
priteau proposed making it easier to skip certain tasks during
deployment, such as bootstrapping or service registration. There are
two obvious ways to do this - a new 'action' (e.g.
deploy-without-register), or add tags to the relevant tasks, which may
be skipped. A third approach is to combine two commands - genconfig &
deploy-containers. This is related to work currently going to allow
configuration to be generated separately from container deployment. In
the end, we considered adding tags, as well as the two command
approach.
## Deprecate reconfigure command?
The reconfigure command is now essentially the same as deploy. This is
a source of confusion for users, so we agreed to deprecate
reconfigure. We will keep reconfigure as an alias for deploy at the
CLI level.
## Modernise the old skool Swift role
The Swift role is the last (except Bifrost, which is generally weird),
to use the old format, prior to the better-reconfigure blueprint. This
has some limitations, so we should bring it into line. Nothing
controversial here, just needs someone to pick it up.
## DNS-based endpoint naming
rafaelweingartne has proposed a mini spec [2] for DNS-based endpoint
naming. I think we now have a reasonable idea of his solution (which
is to use Consul for service discovery, and avoid HAProxy for internal
API communication), however the hard part is defining the problem that
is being solved.
## Support to identity federation (OpenID Connect) configurations in
Keystone and Horizon via Kolla-ansible
There is a patch [3] to support this which has been open for some
time. It requires some fairly specific knowledge to review, and the
challenge will be in making core reviewers confident that it works. We
agreed that CI testing and/or documentation of a simple development
environment would help.
## podman
Podman keeps coming up as a replacement for Docker. There are several
issues here, including support on Debian/Ubuntu - we do not want to
maintain support for two container engines. We agreed that this is
likely to be a large feature, and that we should get started with a
proof of concept in Wallaby, possibly with a spec to tease out the
details. Fl1nt may have some time early 2021 for this, but anyone is
welcome to pick it up.
## Finish Masakari integration
We don't have support for the host monitor, which is arguably the most
useful part of Masakari. With yoctozepto as Masakari PTL, we expect to
see good progress in Wallaby.
## Performance improvements
We made some good progress with performance improvement in Victoria.
There is always more that can be done, and the next phase will
probably involve splitting roles that target both control and compute
hosts. Mitogen [4] is also an interesting option. As always, changes
should be based on benchmarks [5].
## Let's Encrypt integration - container running certbot, triggers
certificate distribution
yoctozepto added initial support for the ACME protocol in Ussuri, and
the next step is to support deployment of a certbot container, and
unattended certificate rotation. There are lots of fiddly details
here, including how to bootstrap it. headphoneJames offered to write
up a high level description of how the pieces fit together.
## Support for HAProxy reload (e.g. for TLS)
Currently any configuration changes to HAProxy require a restart to be
applied. This causes connections to be dropped. We discussed adding
support for hitless reloads of HAProxy [6].
## Native fluentd logging
We didn't discuss this as kevko was busy, but I thought I'd add it
here anyway. The idea is to use use logging.conf to configure services
for native fluentd logging. This avoids all the regexes in fluentd,
but may change our document schema in Elasticsearch. Patch [7] is up
for review.
# Kayobe
## Support multiple environments from a single kayobe configuration
This feature has been on the cards for some time, and is an important
step for Kayobe. priteau has proposed a patch [8] which is a good
starting point. We spent a good hour going over some of the
outstanding issues, and made some good progress. We agreed on the need
to keep the meaning of KAYOBE_CONFIG_PATH intact, and to allow a
branch to define the default environment to use. We agreed to write a
high level design document to pull out the remaining uncertainties,
and aim to get it over the line in Wallaby.
## Switching Docker storage from devicemapper to overlay2
We switched the default Docker storage driver from devicemapper to
overlay2 in Victoria. For existing users of devicemapper, the plan is
to try to ensure they are migrated to overlay2 during CentOS 8
migration. Further, we will deprecate devicemapper on stable branches,
and require users to take some action (set a variable) to allow its
use. This should deter new users from using devicemapper, without
completely breaking existing users.
# Switch to NetworkManager?
jovial proposed this discussion, as he likes the checkpoint/restore
feature of NetworkManager. There are various ways to interface with
NM, including nm-cli, an ansible module, templated config files +
nmcli con reload.
## Strip out Grafana post configure functionality and move it to Kolla-Ansible
dougszu proposed this topic, to improve the separation of concerns
between Kayobe and Kolla Ansible. It may also apply to setup we do for
ironic.
## Create a reference custom playbook repo
We have a stackhpc repo [9] with various useful playbooks and scripts.
dougszu proposed making something more official. I suggested that
Ansible collections could be used to improve the user experience with
custom playbooks in external repos.
## Config gen on ansible control host
This is something we've talked about for a while - offline generation
of configuration, and potentially validating it in a CI pipeline.
jovial noted the importance of being able to redact passwords. This is
mostly dependent on ongoing changes to Kolla Ansible, but could be
wrapped up nicely by Kayobe.
## Hashivault integration
While we have integrated with Hashicorp vault for secret retrieval,
the user experience could be better. We also lack support for pushing
passwords generated by kolla up to Vault. This is likely to be a Kolla
Ansible feature, plus some Kayobe integration.
[1] https://review.opendev.org/681277
[2] https://review.opendev.org/759706
[3] https://review.opendev.org/695432
[4] https://github.com/dw/mitogen
[5] https://github.com/stackhpc/ansible-scaling
[6] https://www.haproxy.com/blog/hitless-reloads-with-haproxy-howto/
[7] https://review.opendev.org/755775
[8] https://review.opendev.org/734867
[9] https://github.com/stackhpc/kayobe-ops
Cheers,
Mark
More information about the openstack-discuss
mailing list