Upgrade Caracal --> Epoxy obstacles
Good morning, I just wanted to share some of our experience from this week when we upgraded our production cloud from Caracal to Epoxy (and also Ubuntu 22.04 --> 24.04). It wasn't as easy as previous upgrades, there were a couple of obstacles on the way. # RabbitMQ The main issue was rabbitmq, I'd say. Since the rabbitmq version was 3.9 on Jammy, we had to get to 3.12 for Noble. I had tried a lot of things in my test environment, in the end I decided to reduce the number of rabbitmq hosts to one, upgraded it with podman from 3.9 to 3.10 to 3.11 to enable the feature flags required for 3.12 (all still under Jammy). Then I could upgrade the OS to Noble so the package based rabbitmq-server would start successfully. # Database The recent thread about older password hashes (https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack....) was helpful, we set the same passwords for those system users that came up in the MySQL query, and we didn't notice any issue. But there were two issues we had to deal with regarding the database: During nova-manage db sync we got these errors: 2025-09-22 14:08:19 2 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'hist_type' at position 9 to have type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB','JSON_HB'), fou nd type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'). 2025-09-22 14:08:19 2 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'histogram' at position 10 to have type longblob, found type varbinary(255). This had not popped up during my tests, but our production database is around 10 years old when we started with our OpenStack journey, my test environment is quite new. What fixed it was: mysql_upgrade check-if-upgrade-is-needed Then there's another thing I have to fully grasp yet, but we now have a ".local" directory in /var/lib/mysql: find /var/lib/mysql/.local/share/containers/storage/ /var/lib/mysql/.local/share/containers/storage/ /var/lib/mysql/.local/share/containers/storage/libpod /var/lib/mysql/.local/share/containers/storage/db.sql Which mysql_upgrade didn't like. I assume that podman created that, but I'm not sure why (I only created a rabbitmq pod). Anyway, I removed that directory so I could proceed with the db sync commands to complete the upgrade process. In my test cloud I removed podman, then restarted galera and the directory is gone, so that's what I'm gonna do in the production as well. # opentack-client Then there are these messages during every openstack command: Could not load 'message_list': module 'zaqarclient.queues.v2.cli' has no attribute 'OldListMessages' Could not load 'message_post': module 'zaqarclient.queues.v2.cli' has no attribute 'OldPostMessages' I'll need to check if they impact any of our scripts, we don't use zaqar, but removing that package would also remove heat, so that won't be possible. # nova-compute: I waited for version 19.2.1 of librbd1 to be released, otherwise nova-compute would fail: qemu-block-extra : Depends: librbd1 (>= 19.2.1-0ubuntu0.24.04.1) but 19.2.0-0ubuntu0.24.04.2 is to be installed # neutron-metadata-agent There's a bug for functioning agents showing as dead, applying the mentioned patches fixes it: https://bugs.launchpad.net/neutron/+bug/2112492 # Pacemaker We use pacemaker for high availability and during the dist-upgrade I had to run 'apt --fix broken', then remove resource-agents and resource-agents-base to get through the dist-upgrade, then install pacemaker and crmsh again afterwards. Not sure if anyone will read this far, but I wanted to get that off my chest. ;-) So obviously, most of the issues weren't really OpenStack issues, but I wanted to share this experience anyway. If you read this far, thanks! ;-) Regards, Eugen
Hello Eugen, Thanks for keeping the community updated! It’s unfortunate that Neutron has not got a new release (26.x.y) in the Epoxy series in 3 months which could have spared you the issue with the metadata agent. When [1] backport (only one open for 2025.1) we can propose a new 26.x.y release to make sure other don’t run into the same issue, as Epoxy is a SLURP a lot of people will be jumping to this version. /Tobias [1] https://review.opendev.org/c/openstack/neutron/+/961483 On 24 Sep 2025, at 10:09, Eugen Block <eblock@nde.ag> wrote: Good morning, I just wanted to share some of our experience from this week when we upgraded our production cloud from Caracal to Epoxy (and also Ubuntu 22.04 --> 24.04). It wasn't as easy as previous upgrades, there were a couple of obstacles on the way. # RabbitMQ The main issue was rabbitmq, I'd say. Since the rabbitmq version was 3.9 on Jammy, we had to get to 3.12 for Noble. I had tried a lot of things in my test environment, in the end I decided to reduce the number of rabbitmq hosts to one, upgraded it with podman from 3.9 to 3.10 to 3.11 to enable the feature flags required for 3.12 (all still under Jammy). Then I could upgrade the OS to Noble so the package based rabbitmq-server would start successfully. # Database The recent thread about older password hashes (https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.openstack.org%2Farchives%2Flist%2Fopenstack-discuss%40lists.openstack.org%2Fthread%2FTNLYCDBGAZEXDKMEINHNLUJX2QNZNYOM%2F&data=05%7C02%7Ctobias.urdin%40binero.com%7C95d273ca02e24f64f4eb08ddfb41c573%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638942982113044590%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=4QeBkKjRnAXRgtBcGWmJ3vSxsWv91K87kjd14fCpHoY%3D&reserved=0) was helpful, we set the same passwords for those system users that came up in the MySQL query, and we didn't notice any issue. But there were two issues we had to deal with regarding the database: During nova-manage db sync we got these errors: 2025-09-22 14:08:19 2 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'hist_type' at position 9 to have type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB','JSON_HB'), fou nd type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'). 2025-09-22 14:08:19 2 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'histogram' at position 10 to have type longblob, found type varbinary(255). This had not popped up during my tests, but our production database is around 10 years old when we started with our OpenStack journey, my test environment is quite new. What fixed it was: mysql_upgrade check-if-upgrade-is-needed Then there's another thing I have to fully grasp yet, but we now have a ".local" directory in /var/lib/mysql: find /var/lib/mysql/.local/share/containers/storage/ /var/lib/mysql/.local/share/containers/storage/ /var/lib/mysql/.local/share/containers/storage/libpod /var/lib/mysql/.local/share/containers/storage/db.sql Which mysql_upgrade didn't like. I assume that podman created that, but I'm not sure why (I only created a rabbitmq pod). Anyway, I removed that directory so I could proceed with the db sync commands to complete the upgrade process. In my test cloud I removed podman, then restarted galera and the directory is gone, so that's what I'm gonna do in the production as well. # opentack-client Then there are these messages during every openstack command: Could not load 'message_list': module 'zaqarclient.queues.v2.cli' has no attribute 'OldListMessages' Could not load 'message_post': module 'zaqarclient.queues.v2.cli' has no attribute 'OldPostMessages' I'll need to check if they impact any of our scripts, we don't use zaqar, but removing that package would also remove heat, so that won't be possible. # nova-compute: I waited for version 19.2.1 of librbd1 to be released, otherwise nova-compute would fail: qemu-block-extra : Depends: librbd1 (>= 19.2.1-0ubuntu0.24.04.1) but 19.2.0-0ubuntu0.24.04.2 is to be installed # neutron-metadata-agent There's a bug for functioning agents showing as dead, applying the mentioned patches fixes it: https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fneutron%2F%2Bbug%2F2112492&data=05%7C02%7Ctobias.urdin%40binero.com%7C95d273ca02e24f64f4eb08ddfb41c573%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638942982113088586%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=JS9Puvxtko%2FCgMXvg6n1qNfoOjGn9Yo8AdnTp00VgDE%3D&reserved=0 # Pacemaker We use pacemaker for high availability and during the dist-upgrade I had to run 'apt --fix broken', then remove resource-agents and resource-agents-base to get through the dist-upgrade, then install pacemaker and crmsh again afterwards. Not sure if anyone will read this far, but I wanted to get that off my chest. ;-) So obviously, most of the issues weren't really OpenStack issues, but I wanted to share this experience anyway. If you read this far, thanks! ;-) Regards, Eugen
Hi, I'd like to chime in as well, as we had a trouble upgrading Trove with SQLAlchemy 2.0. What worked after upgrading to Epoxy: python3 -m pip install 'sqlalchemy==1.4.54' --target /var/lib/trove/.sqlalchemy1 PYTHONPATH=/var/lib/trove/.sqlalchemy1 trove-manage --config-file /etc/trove/trove.conf db_upgrade On Wed, 2025-09-24 at 08:47 +0000, Tobias Urdin - Binero wrote:
Hello Eugen, Thanks for keeping the community updated!
It’s unfortunate that Neutron has not got a new release (26.x.y) in the Epoxy series in 3 months which could have spared you the issue with the metadata agent. When [1] backport (only one open for 2025.1) we can propose a new 26.x.y release to make sure other don’t run into the same issue, as Epoxy is a SLURP a lot of people will be jumping to this version.
/Tobias
[1] https://review.opendev.org/c/openstack/neutron/+/961483
On 24 Sep 2025, at 10:09, Eugen Block <eblock@nde.ag> wrote:
Good morning,
I just wanted to share some of our experience from this week when we upgraded our production cloud from Caracal to Epoxy (and also Ubuntu 22.04 --> 24.04). It wasn't as easy as previous upgrades, there were a couple of obstacles on the way.
# RabbitMQ
The main issue was rabbitmq, I'd say. Since the rabbitmq version was 3.9 on Jammy, we had to get to 3.12 for Noble. I had tried a lot of things in my test environment, in the end I decided to reduce the number of rabbitmq hosts to one, upgraded it with podman from 3.9 to 3.10 to 3.11 to enable the feature flags required for 3.12 (all still under Jammy). Then I could upgrade the OS to Noble so the package based rabbitmq-server would start successfully.
# Database
The recent thread about older password hashes (https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2F lists.openstack.org%2Farchives%2Flist%2Fopenstack- discuss%40lists.openstack.org%2Fthread%2FTNLYCDBGAZEXDKMEINHNLUJX2Q NZNYOM%2F&data=05%7C02%7Ctobias.urdin%40binero.com%7C95d273ca02e24f 64f4eb08ddfb41c573%7C89d97f28180f459da0e585855aa63f6c%7C0%7C0%7C638 942982113044590%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlY iOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7 C0%7C%7C%7C&sdata=4QeBkKjRnAXRgtBcGWmJ3vSxsWv91K87kjd14fCpHoY%3D&re served=0) was helpful, we set the same passwords for those system users that came up in the MySQL query, and we didn't notice any issue. But there were two issues we had to deal with regarding the database:
During nova-manage db sync we got these errors:
2025-09-22 14:08:19 2 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'hist_type' at position 9 to have type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB','JSON_HB'), fou nd type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB'). 2025-09-22 14:08:19 2 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'histogram' at position 10 to have type longblob, found type varbinary(255).
This had not popped up during my tests, but our production database is around 10 years old when we started with our OpenStack journey, my test environment is quite new. What fixed it was:
mysql_upgrade check-if-upgrade-is-needed
Then there's another thing I have to fully grasp yet, but we now have a ".local" directory in /var/lib/mysql:
find /var/lib/mysql/.local/share/containers/storage/ /var/lib/mysql/.local/share/containers/storage/ /var/lib/mysql/.local/share/containers/storage/libpod /var/lib/mysql/.local/share/containers/storage/db.sql
Which mysql_upgrade didn't like. I assume that podman created that, but I'm not sure why (I only created a rabbitmq pod). Anyway, I removed that directory so I could proceed with the db sync commands to complete the upgrade process. In my test cloud I removed podman, then restarted galera and the directory is gone, so that's what I'm gonna do in the production as well.
# opentack-client
Then there are these messages during every openstack command:
Could not load 'message_list': module 'zaqarclient.queues.v2.cli' has no attribute 'OldListMessages' Could not load 'message_post': module 'zaqarclient.queues.v2.cli' has no attribute 'OldPostMessages'
I'll need to check if they impact any of our scripts, we don't use zaqar, but removing that package would also remove heat, so that won't be possible.
# nova-compute:
I waited for version 19.2.1 of librbd1 to be released, otherwise nova-compute would fail:
qemu-block-extra : Depends: librbd1 (>= 19.2.1-0ubuntu0.24.04.1) but 19.2.0-0ubuntu0.24.04.2 is to be installed
# neutron-metadata-agent
There's a bug for functioning agents showing as dead, applying the mentioned patches fixes it:
https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fb ugs.launchpad.net%2Fneutron%2F%2Bbug%2F2112492&data=05%7C02%7Ctobia s.urdin%40binero.com%7C95d273ca02e24f64f4eb08ddfb41c573%7C89d97f281 80f459da0e585855aa63f6c%7C0%7C0%7C638942982113088586%7CUnknown%7CTW FpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zM iIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=JS9Puvxtko%2 FCgMXvg6n1qNfoOjGn9Yo8AdnTp00VgDE%3D&reserved=0
# Pacemaker
We use pacemaker for high availability and during the dist-upgrade I had to run 'apt --fix broken', then remove resource-agents and resource-agents-base to get through the dist-upgrade, then install pacemaker and crmsh again afterwards.
Not sure if anyone will read this far, but I wanted to get that off my chest. ;-) So obviously, most of the issues weren't really OpenStack issues, but I wanted to share this experience anyway.
If you read this far, thanks! ;-)
Regards, Eugen
participants (3)
-
Eugen Block
-
Konstantin Larin
-
Tobias Urdin - Binero