Re: [ops][upgrades][slurp] Seeking OpenStack Upgrade Case Studies

22 Apr 2025


      Any update on this bug
https://bugs.launchpad.net/nova/+bug/2076614  ?

On Tue, 22 Apr 2025, 19:45 Eugen Block, <eblock@nde.ag> wrote:
...
Found one for the nova-manage issue:
https://bugs.launchpad.net/nova/+bug/2076614
But it's triaged without any details about a recommended workaround,
or the ideas mentioned in the last comment.
Zitat von Eugen Block <eblock@nde.ag>:
...
Thanks, yeah we did debug something similar months ago when we
upgraded from V to W or something. In our production cloud, there
was no such instance (00000000-0000-0000-0000-000000000000), so we
created that entry to satisfy the nova db requirements. This is a
virtual lab environment where the instance was present (that's why
we didn't face that issue during the test upgrade).
I guess I could update the instances table with a valid compute_id
since the error from nova-manage is:
2025-04-22 13:18:08.322 19087 ERROR nova.objects.instance [None
req-542f4e62-e738-49aa-aebd-4803b61585ac - - - - - -] [instance:
00000000-0000-0000-0000-000000000000] Unable to migrate instance
because host None with node None not found:
nova.exception.ComputeHostNotFound: Compute host None could not be
found.
I'll check for existing bug reports.
Zitat von engineer2024 <engineerlinux2024@gmail.com>:
...
I faced a similar issue from A to C when running nova db online
migrations.
strangely some process in nova is creating a row in nova.instances table
automatically with the uuid as '0000-0000-00000' and other column values
for this entry as NULL.
On Tue, 22 Apr 2025, 19:22 Eugen Block, <eblock@nde.ag> wrote:
...
Hi Allison,
I was able to test the upgrade twice today (2 control nodes, 1 compute
node). The first attempt was from A to C (slurp). I had several issues
along the way. I haven't been able to look closely for any root causes
yet, and I also didn't check for existing bugs yet, but I wanted to
share anyway:
1. nova-manage db online_data_migrations: one entry doesn't migrate (1
rows matched query populate_instance_compute_id, 0 migrated)
2. cinder-manage db online_data_migrations:
Running batches of 50 until complete.
2025-04-22 11:01:30.558 837433 WARNING py.warnings [None
req-daa04f7f-7daf-4b46-ac78-eff68794cae5 - - - - - -]
/usr/lib/python3/dist-packages/cinder/db/sqlalchemy/api.py:8620:
SAWarning: Coercing Subquery object into a select() for use in IN();
please pass a select() construct explicitly
  filter(admin_meta_table.id.in_(ids_query)).\
+------------------------------------------------+----------------+-------------+
...
| Migration                                      |   Total Needed |
Completed |
|------------------------------------------------+----------------+-------------|
...
| remove_temporary_admin_metadata_data_migration |              0 |
       0 |
+------------------------------------------------+----------------+-------------+
...
3. Some neutron agents (dhcp, metadata) don't properly start until all
control nodes are upgraded, I needed to stop and start them again.
Need to look for more details in the logs.
4. Horizon has stopped working entirely. I only get a "400 Bad
Request", no matter what I do. When the packages upgraded, I saw a
warning "No local_settings file found", but it is there:
root@controller02:~# ll
/usr/lib/python3/dist-packages/openstack_dashboard/local/local_settings.py
...
lrwxrwxrwx 1 root root 42 Jun  7  2024
/usr/lib/python3/dist-packages/openstack_dashboard/local/local_settings.py
...
->
/etc/openstack-dashboard/local_settings.py
The dashboard_error.log also contains the warning that there's no
local_settings file. I suspect that it has to do with the Django
change mentioned in the Caracal release notes:
...
Django 3.2 support was dropped. Django 3.2 ends its extended support
in April 2024. Considering this horizon dropped Django 3.2 support
and uses Django 4.2 as default.
I decided to rollback the VMs to a previous snapshot (back to
Antelope) and tried the upgrade to Bobcat. The result is better, I
didn't see the neutron agents fail, and the dashboard is still usable.
The only issues I still see are the 'nova-manage db
online_data_migrations' and 'cinder-manage db online_data_migrations'.
If some or all of these issues are know, could anyone point me to the
relevant bug reports? If these are new bugs, I can create reports for
them. But as I mentioned, I don't have too much information yet.
Thanks,
Eugen
Zitat von Allison Price <allison@openinfra.dev>:
...
Hi Eugen,
Excellent! Please keep me posted on how testing goes and then we can
take the next steps in talking about your production environment.
Cheers,
Allison
...
On Apr 21, 2025, at 3:10 PM, Eugen Block <eblock@nde.ag> wrote:
Hi and thank you for the links. We just upgraded to Antelope last
week, and we plan to do the next upgrade directly to Caracal since
we need one of the fixes. I’m planning to test the slurp upgrade on
a lab environment quite soon, maybe this week even. And if all goes
well, we’ll upgrade our production quickly after that. I‘d be happy
to share my experience in this thread. :-)
Thanks!
Eugen
Zitat von Allison Price <allison@openinfra.dev>:
> Hi everyone,
>
> We have now published two case studies from OpenStack users who
> have implemented and are benefiting from the SLURP upgrade
> process[1]: Cleura[2] and Indiana University[3]
>
> I wanted to follow up to see if there are other organizations who
> have implemented the SLURP upgrade process who would like to tell
> your story? If we can get a few more, I would love to schedule an
> OpenInfra Live to deep dive into the improvements made around
> OpenStack upgrades and what users stuck on older releases should
> know.
>
> If you are interested, please let me know. And even if you’re not
> but you’re still using OpenStack—please remember to take the
> OpenStack User Survey[4] so we can learn more!
>
> Thanks!
> Allison
>
>
>
> [1]
>
https://docs.openstack.org/project-team-guide/release-cadence-adjustment.htm...
...
...
...
> [2]
https://superuser.openinfra.org/articles/streamlining-openstack-upgrades-man...
...
...
...
> [3]
>
https://superuser.openinfra.org/articles/streamlining-openstack-upgrades-a-c...
...
...
...
> [4] https://www.openstack.org/user-survey