Hello, In more detail this is the procedure we’re using and we recently upgraded two times first from Zed to Antelope, then from Antelope to Caracal. - Install new version of Neutron and run database expand - Upgrade neutron-server on all “controller” nodes - Run database contract - Upgrade OVS, L3, Metadata, DHCP agents on network nodes (on controller nodes in some peoples setups) - First OVS and then wait for it to start correctly - Stop DHCP, L3, Metadata (in that order) - Upgrade agents and start in same order as above - Upgrade OVS agent on compute nodes Happy to take feedback if there is improvement possible on the above From what I remember during all these years we’ve only had issues with upgrades twice, once was a keepalived bug and another was when Neutron translated to primary/backup wording for L3 HA which I think could also be that we did a double jump upgrade causing us to miss some translation patch somewhere or similar. /Tobias
On 21 Mar 2025, at 15:44, Eugen Block <eblock@nde.ag> wrote:
Thanks for your quick response, appreciate it! I've read that page as well, but that's been a while. I guess I didn't pay too much attention since the recent upgrades all went well. Until now, I just ran 'apt upgrade' on the first node, which would upgrade all packages, of course, did an expand and the contract command was issued on the last control node.
So what would be the ideal way? First upgrade only neutron-server and l2 agents on all control node ('apt upgrade --only-upgrade <neutron-server|openvswitch-agent>'), then expand and contract, and then upgrade the rest of the packages?
Zitat von Tobias Urdin - Binero IT <tobias.urdin@binero.com>:
Hello,
We upgrade in a very specific order as mentioned in [1], so first database expand, then all neutron-server applications is upgraded first, then contract, before any agents.
[1] https://docs.openstack.org/neutron/latest/contributor/internals/upgrade.html
/Tobias
On 21 Mar 2025, at 15:12, Eugen Block <eblock@nde.ag> wrote:
Hi *,
maybe I missed some announcement or something, but usually, I read the release notes [0] before upgrading our OpenStack cloud. I didn't notice anything regarding DB schema upgrades. And after the upgrade from Yoga to Zed in a test environment went well, I tried the same in our production today. Note that I didn't have a router in my test cloud, so that's probably why I didn't notice anything.
Unfortunately, there has been a schema change, that's why the l3-agent failed to start properly with this error:
2025-03-21 12:29:14.527 846393 CRITICAL neutron [None req-e225ff0a-82e1-473b-9eba-9a11caa7ace7 - - - - - -] Unhandled error: oslo_messaging.rpc.client.RemoteError: Remote error: OperationalError (pymysql.err.OperationalError) (1054, "Unknown column 'portforwardings.external_port' in 'SELECT'")
Indeed, the upgraded control node didn't have "external_port" anymore in /usr/lib/python3/dist-packages/neutron/db/models/port_forwarding.py, while the not yet upgraded control node did. So the situation could only be resolved by proceeding with the upgrade. But that meant an interruption for our virtual routers, causing floating IPs to be unreachable for a couple of minutes.
Note that we're using highly-available routers. I thought about setting "no-ha" for each router, but that can only be done for disabled routers, which is not an option, of course. And it doesn't really fit into the "rolling upgrade" concept, which has worked great so far. Since we moved to Ubuntu last September (while still on Victoria), we've been able to upgrade to Yoga without any issues.
And while the interruption today was not too critical, I was still surprised that such an important change didn't even make it into the Zed release notes. Was that a mistake or did I miss something? Are there other places I need to check before attempting an upgrade?
Thanks, Eugen
[0] https://docs.openstack.org/releasenotes/neutron/zed.html