[kolla-ansible][tooz][devstack] Upgrade path to etcd v3.4
Hi folks, etcd maintains two stable branches [1], at the moment these appear to be the v3.4 and v3.5 series. etcd does not support skip-version updates without dataloss[2] out of the box, operators are encouraged to do online updates. I took a quick sample of some etcd versions: * devstack currently uses etcd v3.3.12 (at time of writing) [3] * kolla uses etcd v3.3.27 [4] * tripleo uses rdo etcd: it appears they're already on 3.4.14 [5] * upstream etcd is at 3.4.27 and 3.5.9 [6] Now, the bad news: The client side endpoints of etcd is changing from `v3alpha` to `v3`, with a `v3beta` step added to ensure maximum confusion [7] tooz, for example, currently defaults to `v3alpha` for the recommended etcd3gw backend [8], but it's configurable in the client side by passing an extra option to the backend url. So just updating etcd from 3.3 to 3.4 usually breaks things until a couple of orchestration updates are made. It seems providing a smooth upgrade path would require coordination between orchestration, services that depend on etcd as a backend, and a couple of middleware libraries. The lack of a skip-version upgrade support means it's probably also better to do this sooner rather than later. 3.4 is already late in its release cycle and due to go out of maintenance soon. Options: * upgrade to 3.4 and use `v3` endpoint everywhere - fix forward? * try to detect endpoints in middleware (etcd3gw) somehow? * ask ChatGPT how to farm goats Links (not permalinks!): [1]: https://github.com/etcd-io/etcd/blob/main/Documentation/contributor-guide/br... [2]: https://etcd.io/docs/v3.3/upgrades/upgrade_3_4/ [3]: https://opendev.org/openstack/devstack/src/branch/master/stackrc#L723 [4]: https://opendev.org/openstack/kolla/src/branch/master/docker/etcd/Dockerfile... [5]: https://review.rdoproject.org/r/plugins/gitiles/rdoinfo/+/master/buildsys-ta... [6]: https://github.com/etcd-io/etcd/releases/ [7]: https://etcd.io/docs/v3.5/dev-guide/api_grpc_gateway/ [8]: https://opendev.org/openstack/tooz/src/branch/master/tooz/drivers/etcd3gw.py...
On Thu, Aug 10, 2023, at 6:38 AM, Jan Gutter wrote:
Hi folks,
etcd maintains two stable branches [1], at the moment these appear to be the v3.4 and v3.5 series.
etcd does not support skip-version updates without dataloss[2] out of the box, operators are encouraged to do online updates.
I took a quick sample of some etcd versions: * devstack currently uses etcd v3.3.12 (at time of writing) [3] * kolla uses etcd v3.3.27 [4] * tripleo uses rdo etcd: it appears they're already on 3.4.14 [5] * upstream etcd is at 3.4.27 and 3.5.9 [6]
Now, the bad news:
The client side endpoints of etcd is changing from `v3alpha` to `v3`, with a `v3beta` step added to ensure maximum confusion [7]
tooz, for example, currently defaults to `v3alpha` for the recommended etcd3gw backend [8], but it's configurable in the client side by passing an extra option to the backend url.
So just updating etcd from 3.3 to 3.4 usually breaks things until a couple of orchestration updates are made.
It seems providing a smooth upgrade path would require coordination between orchestration, services that depend on etcd as a backend, and a couple of middleware libraries.
The lack of a skip-version upgrade support means it's probably also better to do this sooner rather than later. 3.4 is already late in its release cycle and due to go out of maintenance soon.
Options: * upgrade to 3.4 and use `v3` endpoint everywhere - fix forward?
If I understand correctly you can update from 3.3 to 3.4 in a safe rolling fashion. Then you can update the use of the endpoint name/version. Then you could upgrade from 3.4 to 3.5 in a rolling fashion? Seems like this is a reasonable path forward, but will take some effort. For Devstack you don't typically need to worry about upgrading the etcd DB. That said, I wonder if Grenade complicates things. Do we upgrade services that might only be compatible with etcd 3.4 (or 3.5) that will break if the control plane continues to run etcd 3.3? Seems like we should also update tooz to default to modern endpoints and force overrides if talking to old systems rather than override for current etcd.
* try to detect endpoints in middleware (etcd3gw) somehow?
The only reason I would try and detect the valid endpoint is if we need to support old etcd against new cloud components (or vice versa?) in order to support upgrade paths that might upgrade openstack independently of the etcd database. Otherwise I would roll forward and try to avoid complicating tools like tooz (or the deployment orchestration that configures tooz).
* ask ChatGPT how to farm goats
Links (not permalinks!): [1]: https://github.com/etcd-io/etcd/blob/main/Documentation/contributor-guide/br... [2]: https://etcd.io/docs/v3.3/upgrades/upgrade_3_4/ [3]: https://opendev.org/openstack/devstack/src/branch/master/stackrc#L723 [4]: https://opendev.org/openstack/kolla/src/branch/master/docker/etcd/Dockerfile... [5]: https://review.rdoproject.org/r/plugins/gitiles/rdoinfo/+/master/buildsys-ta... [6]: https://github.com/etcd-io/etcd/releases/ [7]: https://etcd.io/docs/v3.5/dev-guide/api_grpc_gateway/ [8]: https://opendev.org/openstack/tooz/src/branch/master/tooz/drivers/etcd3gw.py...
On Thu, Aug 10, 2023 at 11:13 PM Clark Boylan <cboylan@sapwetik.org> wrote:
On Thu, Aug 10, 2023, at 6:38 AM, Jan Gutter wrote:
Options: * upgrade to 3.4 and use `v3` endpoint everywhere - fix forward?
If I understand correctly you can update from 3.3 to 3.4 in a safe rolling fashion. Then you can update the use of the endpoint name/version. Then you could upgrade from 3.4 to 3.5 in a rolling fashion? Seems like this is a reasonable path forward, but will take some effort.
The sequence is a bit more complicated, it seems. Cinder is using etcd as a coordinator, other services may too (but isn't covered by core devstack, it seems). If downtime is not tolerated, then the move from 3.3 to 3.4 is: 0. current status: etcd 3.3 + coordination URL uses v3alpha 1. update coordination URL to use v3beta (3.3 does not support v3, 3.4 removes v3alpha) 2. update etcd to 3.4 3. update coordination URL to use v3 (3.5 removes v3beta) If you update to etcd 3.4 before changing the coordination URL, the dependent services will break until the coordination URL is updated. It's rather more painful for skip-level updates, of course. If downtime can be tolerated, then the coordination URL can jump directly from v3alpha to v3. At the moment, only grenade fails if the coordination URL is updated to `v3`: 891353: Update etcd version to 3.4.27 | https://review.opendev.org/c/openstack/devstack/+/891353 Just on the off chance it's flakey, I sent up a recheck, but if it fails, I'll try with `v3beta`.
For Devstack you don't typically need to worry about upgrading the etcd DB. That said, I wonder if Grenade complicates things. Do we upgrade services that might only be compatible with etcd 3.4 (or 3.5) that will break if the control plane continues to run etcd 3.3?
Yeah, it turns out cinder is affected... I have not dug into if etcd is or could be updated alongside cinder.
Seems like we should also update tooz to default to modern endpoints and force overrides if talking to old systems rather than override for current etcd.
Yeah, unfortunately the test harness (pifpaf) for etcd is broken for etcd3.4 and requires a new release. 891355: Update the default etcd3gw endpoint to v3 | https://review.opendev.org/c/openstack/tooz/+/891355
participants (2)
-
Clark Boylan
-
Jan Gutter