Some infos about the environment:
- 2 control nodes, most services managed by pacemaker
- recently upgraded to Ubuntu 22.04
I started the upgrade procedure yesterday (2025-03-24 12:55:55) by
stopping all openstack services on the first node (controller02),
expanding keystone db, neutron db, db sync for all other services,
neutron at exactly:
2025-03-24 13:07:31 neutron-db-manage upgrade --expand
At 13:10:39 the services were restarted again.
The other control node (controller01) logged the first error in
neutron-server.log at:
2025-03-24 13:08:10.356 5804 ERROR neutron.db.agentschedulers_db
[req-b5a1c4f0-a28a-4cea-96f7-e27df915bd4c - - - - -] Unexpected
exception occurred while removing network
0682ba75-b750-4318-a75e-c92c347c923b from agent
20709aa0-2c55-4a18-8f7f-2c65c1bc1297: sqlalchemy.exc.OperationalError:
(pymysql.err.OperationalError) (1054, "Unknown column
'portforwardings.external_port' in 'SELECT'")
The entire stack trace is a bit lengthy, I'll paste it here:
https://paste.openstack.org/show/bJAP7rgEXoFH76wC6zSj/
The l3-agent on controller02 was started and failing at:
2025-03-24 13:11:28.942 1238229 INFO neutron.agent.dhcp.agent [None
req-21447e37-7f29-455e-a477-7cc85fb570a5 - - - - - -] Agent has just
been revived. Scheduling full sync
...
2025-03-24 13:11:31.367 1238229 ERROR neutron.agent.dhcp.agent [None
req-b438d86f-a928-43f8-a8db-d256dcb8179b - - - - - -] Unable to
disable dhcp for 0682ba75-b750-4318-a75e-c92c347c923b.:
oslo_messaging.rpc.client.RemoteError: Remote error: OperationalError
(pymysql.err.OperationalError) (1054, "Unknown column
'portforwardings.external_port' in 'SELECT'")
stack trace at: https://paste.openstack.org/show/baInRC9qfafXEpaTqBCW/
Then I upgraded the neutron packages on the second node a few minutes
later, contracting the db at:
2025-03-24 13:19:55 neutron-db-manage upgrade --contract
I seem to be able to reproduce it easily, just need to rollback my VMs
to a previous snapshot and run the upgrade procedure again. If you
need more information, please let me know.
Thanks!
Eugen
Zitat von Rodolfo Alonso Hernandez <ralonsoh@redhat.com>:
> Hello:
>
> The DB schema change is considered in the Neutron DB object [1]. The
> agents, via RPC, do not receive the raw DB object but a json blob derived
> from the Neutron DB object. If the target (the agent) expects a lower
> version, then the json blob is changed. This is why it is not necessary to
> inform about the DB schema changes between versions.
>
> In order to properly debug this issue it would need a traceback of the
> Neutron API and the L3 agent. Also a reproducer could be useful, including
> the current environment conditions. What L3 agent call is causing this
> issue?
>
> Regards.
>
> [1]
> https://review.opendev.org/c/openstack/neutron/+/798961/35/neutron/objects/port_forwarding.py#144
>
> On Tue, Mar 25, 2025 at 1:24 PM Eugen Block <eblock@nde.ag> wrote:
>
>> It didn't take that long to evaluate. Unfortunately, this approach
>> doesn't work for me. I tried only upgrading the neutron-server
>> package, but there are dependencies for the other neutron agents, so
>> they are upgraded as well. I could reduce the downtime of the
>> L3-agent, though.
>> Since this isn't a recurring issue (upgrades in general, but also db
>> schema changes), we'll stick with our current upgrade procedure.
>>
>> But I'm still voting for adding db schema changes to the release notes.
>>
>> Thanks again,
>> Eugen
>>
>> Zitat von Eugen Block <eblock@nde.ag>:
>>
>> > Hi,
>> >
>> > thanks for sharing!
>> > I'll have to adapt my upgrade procedure and test it properly. This
>> > could take a while, though.
>> >
>> > Zitat von Tobias Urdin - Binero IT <tobias.urdin@binero.com>:
>> >
>> >> Hello,
>> >>
>> >> In more detail this is the procedure we’re using and we recently
>> >> upgraded two times first from
>> >> Zed to Antelope, then from Antelope to Caracal.
>> >>
>> >> - Install new version of Neutron and run database expand
>> >>
>> >> - Upgrade neutron-server on all “controller” nodes
>> >>
>> >> - Run database contract
>> >>
>> >> - Upgrade OVS, L3, Metadata, DHCP agents on network nodes (on
>> >> controller nodes in some peoples setups)
>> >>
>> >> - First OVS and then wait for it to start correctly
>> >>
>> >> - Stop DHCP, L3, Metadata (in that order)
>> >>
>> >> - Upgrade agents and start in same order as above
>> >>
>> >> - Upgrade OVS agent on compute nodes
>> >>
>> >> Happy to take feedback if there is improvement possible on the above
>> >>
>> >> From what I remember during all these years we’ve only had issues
>> >> with upgrades twice, once
>> >> was a keepalived bug and another was when Neutron translated to
>> >> primary/backup wording
>> >> for L3 HA which I think could also be that we did a double jump
>> >> upgrade causing us to miss
>> >> some translation patch somewhere or similar.
>> >>
>> >> /Tobias
>> >>
>> >>> On 21 Mar 2025, at 15:44, Eugen Block <eblock@nde.ag> wrote:
>> >>>
>> >>> Thanks for your quick response, appreciate it!
>> >>> I've read that page as well, but that's been a while. I guess I
>> >>> didn't pay too much attention since the recent upgrades all went
>> >>> well. Until now, I just ran 'apt upgrade' on the first node, which
>> >>> would upgrade all packages, of course, did an expand and the
>> >>> contract command was issued on the last control node.
>> >>>
>> >>> So what would be the ideal way? First upgrade only neutron-server
>> >>> and l2 agents on all control node ('apt upgrade --only-upgrade
>> >>> <neutron-server|openvswitch-agent>'), then expand and contract,
>> >>> and then upgrade the rest of the packages?
>> >>>
>> >>>
>> >>> Zitat von Tobias Urdin - Binero IT <tobias.urdin@binero.com>:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> We upgrade in a very specific order as mentioned in [1], so first
>> >>>> database expand, then all neutron-server
>> >>>> applications is upgraded first, then contract, before any agents.
>> >>>>
>> >>>> [1]
>> >>>>
>> https://docs.openstack.org/neutron/latest/contributor/internals/upgrade.html
>> >>>>
>> >>>> /Tobias
>> >>>>
>> >>>>> On 21 Mar 2025, at 15:12, Eugen Block <eblock@nde.ag> wrote:
>> >>>>>
>> >>>>> Hi *,
>> >>>>>
>> >>>>> maybe I missed some announcement or something, but usually, I
>> >>>>> read the release notes [0] before upgrading our OpenStack cloud.
>> >>>>> I didn't notice anything regarding DB schema upgrades. And after
>> >>>>> the upgrade from Yoga to Zed in a test environment went well, I
>> >>>>> tried the same in our production today. Note that I didn't have
>> >>>>> a router in my test cloud, so that's probably why I didn't
>> >>>>> notice anything.
>> >>>>>
>> >>>>> Unfortunately, there has been a schema change, that's why the
>> >>>>> l3-agent failed to start properly with this error:
>> >>>>>
>> >>>>> 2025-03-21 12:29:14.527 846393 CRITICAL neutron [None
>> >>>>> req-e225ff0a-82e1-473b-9eba-9a11caa7ace7 - - - - - -] Unhandled
>> >>>>> error: oslo_messaging.rpc.client.RemoteError: Remote error:
>> >>>>> OperationalError (pymysql.err.OperationalError) (1054, "Unknown
>> >>>>> column 'portforwardings.external_port' in 'SELECT'")
>> >>>>>
>> >>>>> Indeed, the upgraded control node didn't have "external_port"
>> >>>>> anymore in
>> >>>>>
>> /usr/lib/python3/dist-packages/neutron/db/models/port_forwarding.py,
>> >>>>> while the not yet upgraded control node did. So the situation
>> >>>>> could only be resolved by proceeding with the upgrade. But that
>> >>>>> meant an interruption for our virtual routers, causing floating
>> >>>>> IPs to be unreachable for a couple of minutes.
>> >>>>>
>> >>>>> Note that we're using highly-available routers. I thought about
>> >>>>> setting "no-ha" for each router, but that can only be done for
>> >>>>> disabled routers, which is not an option, of course. And it
>> >>>>> doesn't really fit into the "rolling upgrade" concept, which has
>> >>>>> worked great so far. Since we moved to Ubuntu last September
>> >>>>> (while still on Victoria), we've been able to upgrade to Yoga
>> >>>>> without any issues.
>> >>>>>
>> >>>>> And while the interruption today was not too critical, I was
>> >>>>> still surprised that such an important change didn't even make
>> >>>>> it into the Zed release notes. Was that a mistake or did I miss
>> >>>>> something? Are there other places I need to check before
>> >>>>> attempting an upgrade?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Eugen
>> >>>>>
>> >>>>> [0] https://docs.openstack.org/releasenotes/neutron/zed.html
>> >>>>>
>> >>>
>> >>>
>> >>>
>>
>>
>>
>>