On 24 Sep 2025, at 10:09, Eugen Block <eblock@nde.ag> wrote:

Good morning,

I just wanted to share some of our experience from this week when we upgraded our production cloud from Caracal to Epoxy (and also Ubuntu 22.04 --> 24.04). It wasn't as easy as previous upgrades, there were a couple of obstacles on the way.

# RabbitMQ

The main issue was rabbitmq, I'd say. Since the rabbitmq version was 3.9 on Jammy, we had to get to 3.12 for Noble. I had tried a lot of things in my test environment, in the end I decided to reduce the number of rabbitmq hosts to one, upgraded it with podman from 3.9 to 3.10 to 3.11 to enable the feature flags required for 3.12 (all still under Jammy). Then I could upgrade the OS to Noble so the package based rabbitmq-server would start successfully.

# Database

The recent thread about older password hashes (https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.openstack.org%2Farchives%2Flist%2Fopenstack-discuss%40lists.openstack.org%2Fthread%2FTNLYCDBGAZEXDKMEINHNLUJX2QNZNYOM%2F&data=05%7C02%7Ctobias.urdin%40binero.com%7C95d273ca02e24f64f4eb08ddfb41c573%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638942982113044590%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=4QeBkKjRnAXRgtBcGWmJ3vSxsWv91K87kjd14fCpHoY%3D&reserved=0) was helpful, we set the same passwords for those system users that came up in the MySQL query, and we didn't notice any issue. But there were two issues we had to deal with regarding the database:

During nova-manage db sync we got these errors:

2025-09-22 14:08:19 2 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'hist_type' at position 9 to have type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB','JSON_HB'), fou
nd type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB').
2025-09-22 14:08:19 2 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'histogram' at position 10 to have type longblob, found type varbinary(255).

This had not popped up during my tests, but our production database is around 10 years old when we started with our OpenStack journey, my test environment is quite new. What fixed it was:

mysql_upgrade check-if-upgrade-is-needed

Then there's another thing I have to fully grasp yet, but we now have a ".local" directory in /var/lib/mysql:

find /var/lib/mysql/.local/share/containers/storage/
/var/lib/mysql/.local/share/containers/storage/
/var/lib/mysql/.local/share/containers/storage/libpod
/var/lib/mysql/.local/share/containers/storage/db.sql

Which mysql_upgrade didn't like. I assume that podman created that, but I'm not sure why (I only created a rabbitmq pod). Anyway, I removed that directory so I could proceed with the db sync commands to complete the upgrade process. In my test cloud I removed podman, then restarted galera and the directory is gone, so that's what I'm gonna do in the production as well.

# opentack-client

Then there are these messages during every openstack command:

Could not load 'message_list': module 'zaqarclient.queues.v2.cli' has no attribute 'OldListMessages'
Could not load 'message_post': module 'zaqarclient.queues.v2.cli' has no attribute 'OldPostMessages'

I'll need to check if they impact any of our scripts, we don't use zaqar, but removing that package would also remove heat, so that won't be possible.

# nova-compute:

I waited for version 19.2.1 of librbd1 to be released, otherwise nova-compute would fail:

qemu-block-extra : Depends: librbd1 (>= 19.2.1-0ubuntu0.24.04.1) but 19.2.0-0ubuntu0.24.04.2 is to be installed

# neutron-metadata-agent

There's a bug for functioning agents showing as dead, applying the mentioned patches fixes it:

https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fneutron%2F%2Bbug%2F2112492&data=05%7C02%7Ctobias.urdin%40binero.com%7C95d273ca02e24f64f4eb08ddfb41c573%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638942982113088586%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=JS9Puvxtko%2FCgMXvg6n1qNfoOjGn9Yo8AdnTp00VgDE%3D&reserved=0

# Pacemaker

We use pacemaker for high availability and during the dist-upgrade I had to run 'apt --fix broken', then remove resource-agents and resource-agents-base to get through the dist-upgrade, then install pacemaker and crmsh again afterwards.

Not sure if anyone will read this far, but I wanted to get that off my chest. ;-) So obviously, most of the issues weren't really OpenStack issues, but I wanted to share this experience anyway.

If you read this far, thanks! ;-)

Regards,
Eugen