The Sunbeam project participated in the vPTG last week; here is a short summary of the topics that we discussed and plans added for the Caracal cycle.
# Full Cycle Retro
## Celebrations - Significant reduction in both the time taken to bootstrap and resize a deployed cluster. - Switch from Antelope to Bobcat in the charms was smooth and relatively quick. - Many new plugins delivered during the cycle, enabling lots more of the OpenStack ecosystem - telemetry, magnum, heat, barbican and designate. - Refactoring of ops-sunbeam to support both k8s and machine charms was a success - Documentation starting to look great and developing lots of depth - Lots of contributions to the open source tools we depend on including snapd, Juju (inc terraform provider), microk8s, traefik and prometheus charms.
## Learnings - Switching to Bobcat for snaps and terraform plans took a bit longer than it should have done - most likely down to this being a largely new process for the team. - Some impact on velocity due to bugs in dependencies outside of OpenStack - Release process needs to be better documented so more of the core team can execute it. - Plugin full functional test coverage still needs some work.
## Actions - Implementation of automated functional testing for plugins - Full documentation and improved automation of the release process
# Debugging Problems
We had quite an extended conversation about how we help users debug both bugs and deployment specific issues in Sunbeam; the general observability feature we're working on will help in this area once a cloud is deployed but we need some extra features to help users before this stage of the deployment is reached.
sunbeam inspect is a bit lightweight still; can supplement the information gathered with more extensive information collected from microk8s so we need to document this into the bug reporting guidance.
- Improve docs topic 'Inspecting the cluster' to include more detail of collecting debug information for reporting bugs - Update the guidance for reporting bugs in Launchpad projects to help users report better more qualified bugs
# Monorepo for Sunbeam Charms
With our current approach of 1:1 mapping between charms and repositories, we're going to have a similar challenge to the OpenStack Charms project when we need to effect a change across the entire charmset (bumping to a new release, updating a dependency or alike). This generates a large number of reviews, and associated costs of testing, which then need to be managed through to landing by one of the core team.
Liam proposed that we look at moving to a single mono-repo for all of the charms in the project, reducing the overhead of 'batch' type operations and making it generally easier to manage this part of Sunbeam. We'd also address integrated testing of a change that impacts multiple charms without having to model cross-repo dependencies. The resource required to complete a function test would increase in size, but reduce hugely in number when compared to our current approach.
We'll need to figure out tooling to determine which charms are impacted by a specific change so that we only build, test and publish charms that need to be changed, rather than the entire charmset.
Some projects in the wider charm ecosystem are using this approach to good effect.
- Liam to take this idea forward early during the Caracal cycle - will need some coordination and advice from the infra team (see next topic).
# Integration testing
Although the Sunbeam charms do include functional testing, it's not a complete integration test of a full Sunbeam deployment. We'd like to achieve this goal but this will require test instances with more memory than we typically get right now.
We discussed that this felt linked to the idea of the mono-repo for charms as it feels unreasonable to ask for increased instance sizes for and increasing set of repositories when we could achieve better resource usage efficiency with a mono-repo.
We agreed to fold this into the mono-repo work early this cycle.
# Multi-node testing
All of our multi-node testing is currently not executed as part of the project CI workflows - we really need to test this part of Sunbeam as part of our gating for changes.
We discussed a few ideas in this area including use of LXD containers on a big instance type or use of the 3rd party CI provided by Canonical that currently supports the OpenStack Charms project.
Both might present viable routes forward and will be considered during the coming cycle.
# Managing OCI artefacts references in charms
The OCI resources used by the Sunbeam K8S charms are managed within a different project; we need to be aware of when new OCI's are published so we can feed these updates into the charms and publish new charm revisions and associated OCI resources.
There is some prior art on github modelled as actions to support this type of workflow but we need something that will work across that boundary into the opendev gerrit infrastructure.
- Reach out to the opendev infrastructure team as to whether there is any prior work on this type of function.
# Single Container Multiple Services
We've spent quite a bit of time this cycle reducing the physical footprint of the OCI resources we use for Sunbeam. We discussed whether this might be taken further by reducing the number of containers we run in each charm - this is made possible by the fact that the OCI entry point is managed using Pebble, which supports multiple service definitions so we can run multiple daemons within the same container.
We agreed to prototype this in one of the more complicated charms to see how much benefit might be achieved by taking this approach.
- POC probably using the nova-k8s charm to assess benefits.
We had around 6 active participants during our session with a few others dropping in to listen in on the conversation from time to time - thankyou to all of those who contributed to our discussions and I look forward to the next (v)PTG.
James Sunbeam PTL