[magnum][heat] Rolling system upgrades
feilong at catalyst.net.nz
Sat Jan 23 17:55:05 UTC 2021
Thanks for raising this topic because I'm planning to do improvements
for this area. I would like to help as the original author of this
feature. Now let me explain the current situation:
1. The first method is designed to work for both Fedora Atomic and
Fedora CoreOS. Though I agree after upgrade, the node image will be
remain the old name and ID which will bring troubles for auto healing
later. That's the problem I'm trying to fix but it's not easy. As for
the new node, I think it's a bug and I think I know how to fix it. Your
concern about upgrade from a very old node's OS to a quite new OS
version is valid :(
2. It works under conditions. The node should be image based instead of
volume based, because AFAIK, Nova still doesn't support volume based
instance rebuild. Did you try this with image based nodes? As for the
drain part, it's because we would like to achieve a zero-downtime
upgrade (at least it's my goal for this), so each node will be drained
before upgrading. However, I didn't see a way to manage the
orchestration to call a k8s drain before doing the rebuild of the node,
because it's out of the control of Magnum. Heat is a like a black box at
this stage. Also, even if we can have a chance to call k8s drain to
drain the node, it's impossible to do that if the cluster is a private
cluster. Private cluster means Magnum control plane cannot reach the k8s
Again, thank you raising this and I'm happy to help to address it.
On 22/01/21 10:03 pm, Krzysztof Klimonda wrote:
> While testing magnum, a problem of upgrades came up - while work has been done to make kubernetes upgrades without interruption, operating system upgrades seem to be handled only partially.
> According to the documentation, two ways of upgrading system are available:
> - via specifying ostree_commit or ostree_remote labels in the cluster template used for upgrade
> - via specifying a new image in the cluster template used for upgrade
> The first one is specific to Fedora Atomic (and, while probably untested, seems to be mostly working with Fedora CoreOS) but it has some drawbacks. Firstly, due to base image staying the same we require this image for the life of the cluster, even if OS has already been upgraded. Secondly, using this method only upgrades existing instances and new instances (spawned via scaling cluster up) will not be upgraded. Thirdly, even if that is fixed I'm worried that at some point upgrading from old base image to some future ostree snapshot will fail (there is also cost associated with diff growing with each release).
> The second method, of specifying a new image in the cluster template used for upgrade, comes with an ugly warning about nodes not being drained properly before server rebuild (and it actually doesn't seem to be working anyway as the new image parameter is not being passed to the heat template on upgrade). This does however seem like a more valid approach in general.
> I'm not that familar with Heat, and the documentation of various OS::Heat::Software* resources seems inconclusive, but is there no way of executing some code before instance is rebuilt? If not, how are other projects and users handling this in general?
Cheers & Best regards,
Feilong Wang (王飞龙)
Senior Cloud Software Engineer
Email: flwang at catalyst.net.nz
Catalyst IT Limited
Level 6, Catalyst House, 150 Willis Street, Wellington
More information about the openstack-discuss