[ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...)

newer
[neutron] Bug deputy report for...

older
[devstack][openstack-qa] DevStack...

Massimo Sgaravatto

10 Jan 2022 10 Jan '22

8 p.m.

Dear all When we upgraded our Cloud from Rocky to Train we followed the following procedure: 1) Shutdown of all services on the controller and compute nodes 2) Update from Rocky to Stein of controller (just to do the dbsyncs) 3) Update from Stein to Train of controller 4) Update from Rocky to Train of compute nodes We are trying to do the same to update from Train to Xena, but now there is a problem because nova services on the controller node refuse to start since they find too old compute nodes (this is indeed a new feature, properly documented in the release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table. Is it ok, or is there a cleaner way to manage the issue ? Thanks, Massimo

Attachments:

attachment.html (text/html — 974 bytes)

Show replies by date

Dan Smith

10 Jan 10 Jan

8:38 p.m.

...

We are trying to do the same to update from Train to Xena, but now there is a problem because nova services on the controller node refuse to start since they find too old compute nodes (this is indeed a new feature, properly documented in the release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table.

Is it ok, or is there a cleaner way to manage the issue ?

I think this is an unintended consequence of the new check. Can you file a bug against nova and report the number here? We probably need to do something here... Thanks! --Dan

Sean Mooney

8:58 p.m.

...

Dear all

When we upgraded our Cloud from Rocky to Train we followed the following procedure:

1) Shutdown of all services on the controller and compute nodes 2) Update from Rocky to Stein of controller (just to do the dbsyncs) 3) Update from Stein to Train of controller 4) Update from Rocky to Train of compute nodes

We are trying to do the same to update from Train to Xena, but now there is a problem because nova services on the controller node refuse to start since they find too old compute nodes (this is indeed a new feature, properly documented in the release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table.

Is it ok, or is there a cleaner way to manage the issue ?

On Mon, 2022-01-10 at 18:00 +0100, Massimo Sgaravatto wrote: the check is mainly implemeented by https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09... i belive the intent was this shoudl only be an issue if the service report as up so you should be able to do the following. 1 stop nova-compute on all nodes 2 wait for compute service to be down then stop contolers. 3 upgrade contoler directly to xena skiping all intermediary releases. (the db sysncs have never needed to be done every release we keep the migration for many releases. there also are no db change between train and wallaby and i dont think there are any in xena either) 4 upgrade the nova-compute on all compute nodes. looking at the code however i dont think we ar checking the status of the services at all so it is an absolute check. as a result you can nolonger do FFU which im surpised no on has complained about before. this was implemented by https://github.com/openstack/nova/commit/aa7c6f87699ec1340bd446a7d47e1453847... in wallaby just to be clear we have never actully support having active nova service wherne the version mix is greate then n+1 we just started enforceing that in wallaby

...

Thanks, Massimo

Massimo Sgaravatto

9:19 p.m.

Good to know that it is not necessary for nova to go through ALL intermediate releases and perform db-sync The question is if this is true for ALL openstack services (in our deployment the controller node is used for all services and not only for nova) Thanks, Massimo On Mon, Jan 10, 2022 at 7:03 PM Sean Mooney <smooney@redhat.com> wrote:

...

...
Dear all

When we upgraded our Cloud from Rocky to Train we followed the following procedure:

1) Shutdown of all services on the controller and compute nodes 2) Update from Rocky to Stein of controller (just to do the dbsyncs) 3) Update from Stein to Train of controller 4) Update from Rocky to Train of compute nodes

We are trying to do the same to update from Train to Xena, but now there is a problem because nova services on the controller node refuse to start since they find too old compute nodes (this is indeed a new feature, properly documented in

...
release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table.

Is it ok, or is there a cleaner way to manage the issue ?

On Mon, 2022-01-10 at 18:00 +0100, Massimo Sgaravatto wrote: the the check is mainly implemeented by

https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09...

i belive the intent was this shoudl only be an issue if the service report as up

so you should be able to do the following. 1 stop nova-compute on all nodes 2 wait for compute service to be down then stop contolers. 3 upgrade contoler directly to xena skiping all intermediary releases. (the db sysncs have never needed to be done every release we keep the migration for many releases. there also are no db change between train and wallaby and i dont think there are any in xena either) 4 upgrade the nova-compute on all compute nodes.

looking at the code however i dont think we ar checking the status of the services at all so it is an absolute check.

as a result you can nolonger do FFU which im surpised no on has complained about before.

this was implemented by https://github.com/openstack/nova/commit/aa7c6f87699ec1340bd446a7d47e1453847... in wallaby

just to be clear we have never actully support having active nova service wherne the version mix is greate then n+1 we just started enforceing that in wallaby

...
Thanks, Massimo

Dan Smith

9:50 p.m.

...

Good to know that it is not necessary for nova to go through ALL intermediate releases and perform db-sync The question is if this is true for ALL openstack services (in our deployment the controller node is used for all services and not only for nova)

Actually, Sean is wrong here - we do expect you to go through each release on the controller, it's just that it's rare that it's actually a problem. We have had blocker migrations at times in the past where we have had to ensure that data is migrated before changing or dropping items of schema. We also recently did a schema compaction, which wouldn't tolerate moving across the releases without the (correct) intermediate step. We definitely should fix the problem related to compute records being old and causing the controllers to start. However, at the moment, you should still assume that each intermediate release needs to be db-sync'd unless you've tested that a particular source and target release works. I expect the same requirement for most other projects. --Dan

Sean Mooney

10:45 p.m.

On Mon, 2022-01-10 at 10:50 -0800, Dan Smith wrote:

...

...
Good to know that it is not necessary for nova to go through ALL intermediate releases and perform db-sync The question is if this is true for ALL openstack services (in our deployment the controller node is used for all services and not only for nova)

Actually, Sean is wrong here - we do expect you to go through each release on the controller, it's just that it's rare that it's actually a problem. We have had blocker migrations at times in the past where we have had to ensure that data is migrated before changing or dropping items of schema. We also recently did a schema compaction, which wouldn't tolerate moving across the releases without the (correct) intermediate step. dan is correct. you should run each on the contoler back to back. between train and wallaby specifical and we are in a special case where we just happen to not change the db in those releases. xena we started doing db compation yes and moving to alembic instead of sqlachmy-migrate.

from a cli point of view that is transparent at the nova manage level but it is still best to do it each release on the contoler to ensure that tanstion happens corretly.

...

We definitely should fix the problem related to compute records being old and causing the controllers to start. However, at the moment, you should still assume that each intermediate release needs to be db-sync'd unless you've tested that a particular source and target release works. I expect the same requirement for most other projects.

we have not tested skiping them on the contolers but i bevile in this case it woudl work ok to go directly from train to the wallaby code base and do the db sync. train to xena may not work. if the start and end version were different there is not guarentted that it woudl work due to the blocker migration, online migrations and eventual drop of migration code that dan mentioned. but ya unless you have tested it better to assume you cant skip.

...

--Dan

Gorka Eguileor

11:05 p.m.

On 10/01, Dan Smith wrote:

...

...
Good to know that it is not necessary for nova to go through ALL intermediate releases and perform db-sync The question is if this is true for ALL openstack services (in our deployment the controller node is used for all services and not only for nova)

Actually, Sean is wrong here - we do expect you to go through each release on the controller, it's just that it's rare that it's actually a problem. We have had blocker migrations at times in the past where we have had to ensure that data is migrated before changing or dropping items of schema. We also recently did a schema compaction, which wouldn't tolerate moving across the releases without the (correct) intermediate step.

We definitely should fix the problem related to compute records being old and causing the controllers to start. However, at the moment, you should still assume that each intermediate release needs to be db-sync'd unless you've tested that a particular source and target release works. I expect the same requirement for most other projects.

--Dan

Hi, Unrelated to this Nova issue, but related to why intermediate releases cannot be skipped in OpenStack, the Cinder project requires that the db sync and the online data migrations are run on each intermediate release. You may be lucky and everything may run fine, but it might as well blow up in your face and lose database data. Cheers, Gorka.

Massimo Sgaravatto

25 Feb 25 Feb

6:50 p.m.

I had the chance to repeat this test So the scenario is: 1) controller and compute nodes running train 2) all services stopped in compute nodes 3) controller updated: train-->ussuri-->victoria--> wallaby After that nova conductor and nova scheduler refuses to start [*] At that moment nova-compute services were not running on the compute nodes And this was the status on the services table: mysql> select * from services where topic="compute"; +---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+ | created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid | +---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+ | 2018-01-11 17:20:34 | 2022-02-25 09:09:17 | NULL | 17 | compute-01.cloud.pd.infn.it | nova-compute | compute | 10250811 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:09:13 | 0 | 40 | 2f56b8cf-1190-4999-af79-6bcee695c653 | | 2018-01-11 17:26:39 | 2022-02-25 09:09:49 | NULL | 23 | compute-02.cloud.pd.infn.it | nova-compute | compute | 10439622 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:09:49 | 0 | 40 | fbe37dfd-4a6c-4da1-96e0-407f7f98c4c4 | | 2018-01-11 17:27:12 | 2022-02-25 09:10:02 | NULL | 24 | compute-03.cloud.pd.infn.it | nova-compute | compute | 10361295 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:10:02 | 0 | 40 | 3675f324-81dd-445a-b4eb-510726104be3 | | 2021-04-06 12:54:42 | 2022-02-25 09:10:02 | NULL | 25 | compute-04.cloud.pd.infn.it | nova-compute | compute | 1790955 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:10:02 | 0 | 40 | e3e7af4d-b25b-410c-983e-8128a5e97219 | +---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+ 4 rows in set (0.00 sec) Only after manually setting the version field of these entries to '54', nova-conductor and nova-scheduler were able to start Regards, Massimo [*] 2022-02-25 15:06:03.992 591600 CRITICAL nova [req-cc20f294-cced-434b-98cd-5bdf228a2a22 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova ve rsion does not support computes older than Wallaby but the minimum compute service level in your system is 40 and the oldest supported service level is 54. 2022-02-25 15:06:03.992 591600 ERROR nova Traceback (most recent call last): 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/bin/nova-conductor", line 10, in <module> 2022-02-25 15:06:03.992 591600 ERROR nova sys.exit(main()) 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/conductor.py", line 46, in main 2022-02-25 15:06:03.992 591600 ERROR nova topic=rpcapi.RPC_TOPIC) 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 264, in create 2022-02-25 15:06:03.992 591600 ERROR nova utils.raise_if_old_compute() 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/utils.py", line 1098, in raise_if_old_compute 2022-02-25 15:06:03.992 591600 ERROR nova oldest_supported_service=oldest_supported_service_level) 2022-02-25 15:06:03.992 591600 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Wallaby but the minimum comput e service level in your system is 40 and the oldest supported service level is 54. 2022-02-25 15:06:03.992 591600 ERROR nova On Mon, Jan 10, 2022 at 6:00 PM Massimo Sgaravatto < massimo.sgaravatto@gmail.com> wrote:

...

Dear all

When we upgraded our Cloud from Rocky to Train we followed the following procedure:

1) Shutdown of all services on the controller and compute nodes 2) Update from Rocky to Stein of controller (just to do the dbsyncs) 3) Update from Stein to Train of controller 4) Update from Rocky to Train of compute nodes

We are trying to do the same to update from Train to Xena, but now there is a problem because nova services on the controller node refuse to start since they find too old compute nodes (this is indeed a new feature, properly documented in the release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table.

Is it ok, or is there a cleaner way to manage the issue ?

Thanks, Massimo

Sean Mooney

7:14 p.m.

On Fri, 2022-02-25 at 16:50 +0100, Massimo Sgaravatto wrote:

...

I had the chance to repeat this test So the scenario is:

1) controller and compute nodes running train 2) all services stopped in compute nodes 3) controller updated: train-->ussuri-->victoria--> wallaby

After that nova conductor and nova scheduler refuses to start [*]

yes nova does not offially support n to n+3 upgrade we started enforcing that a few release ago. there is a workaround config option that we recently added that turns the error into a waring https://docs.openstack.org/nova/latest/configuration/config.html#workarounds... that is one option or you can implement or before you upgrade the contoler you can force-down all the comptue nodes

...

At that moment nova-compute services were not running on the compute nodes And this was the status on the services table:

mysql> select * from services where topic="compute"; +---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+

...
created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid | +---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+ 2018-01-11 17:20:34 | 2022-02-25 09:09:17 | NULL | 17 | compute-01.cloud.pd.infn.it | nova-compute | compute | 10250811 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:09:13 | 0 | 40 | 2f56b8cf-1190-4999-af79-6bcee695c653 | 2018-01-11 17:26:39 | 2022-02-25 09:09:49 | NULL | 23 | compute-02.cloud.pd.infn.it | nova-compute | compute | 10439622 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:09:49 | 0 | 40 | fbe37dfd-4a6c-4da1-96e0-407f7f98c4c4 | 2018-01-11 17:27:12 | 2022-02-25 09:10:02 | NULL | 24 | compute-03.cloud.pd.infn.it | nova-compute | compute | 10361295 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:10:02 | 0 | 40 | 3675f324-81dd-445a-b4eb-510726104be3 | 2021-04-06 12:54:42 | 2022-02-25 09:10:02 | NULL | 25 | compute-04.cloud.pd.infn.it | nova-compute | compute | 1790955 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:10:02 | 0 | 40 | e3e7af4d-b25b-410c-983e-8128a5e97219 | +---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+ 4 rows in set (0.00 sec)

Only after manually setting the version field of these entries to '54', nova-conductor and nova-scheduler were able to start

Regards, Massimo

[*] 2022-02-25 15:06:03.992 591600 CRITICAL nova [req-cc20f294-cced-434b-98cd-5bdf228a2a22 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova ve rsion does not support computes older than Wallaby but the minimum compute service level in your system is 40 and the oldest supported service level is 54. 2022-02-25 15:06:03.992 591600 ERROR nova Traceback (most recent call last): 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/bin/nova-conductor", line 10, in <module> 2022-02-25 15:06:03.992 591600 ERROR nova sys.exit(main()) 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/conductor.py", line 46, in main 2022-02-25 15:06:03.992 591600 ERROR nova topic=rpcapi.RPC_TOPIC) 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 264, in create 2022-02-25 15:06:03.992 591600 ERROR nova utils.raise_if_old_compute() 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/utils.py", line 1098, in raise_if_old_compute 2022-02-25 15:06:03.992 591600 ERROR nova oldest_supported_service=oldest_supported_service_level) 2022-02-25 15:06:03.992 591600 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Wallaby but the minimum comput e service level in your system is 40 and the oldest supported service level is 54. 2022-02-25 15:06:03.992 591600 ERROR nova

On Mon, Jan 10, 2022 at 6:00 PM Massimo Sgaravatto < massimo.sgaravatto@gmail.com> wrote:

...
Dear all

When we upgraded our Cloud from Rocky to Train we followed the following procedure:

1) Shutdown of all services on the controller and compute nodes 2) Update from Rocky to Stein of controller (just to do the dbsyncs) 3) Update from Stein to Train of controller 4) Update from Rocky to Train of compute nodes

We are trying to do the same to update from Train to Xena, but now there is a problem because nova services on the controller node refuse to start since they find too old compute nodes (this is indeed a new feature, properly documented in the release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table.

Is it ok, or is there a cleaner way to manage the issue ?

Thanks, Massimo

Massimo Sgaravatto

7:57 p.m.

Thanks This "disable_compute_service_check_for_ffu" option is not available in xena, correct ? Cheers, Massimo On Fri, Feb 25, 2022 at 5:15 PM Sean Mooney <smooney@redhat.com> wrote:

...

On Fri, 2022-02-25 at 16:50 +0100, Massimo Sgaravatto wrote:

...
I had the chance to repeat this test So the scenario is:

1) controller and compute nodes running train 2) all services stopped in compute nodes 3) controller updated: train-->ussuri-->victoria--> wallaby

After that nova conductor and nova scheduler refuses to start [*]

yes nova does not offially support n to n+3 upgrade we started enforcing that a few release ago. there is a workaround config option that we recently added that turns the error into a waring https://docs.openstack.org/nova/latest/configuration/config.html#workarounds... that is one option or you can implement or before you upgrade the contoler you can force-down all the comptue nodes

...
At that moment nova-compute services were not running on the compute

nodes

...
And this was the status on the services table:

mysql> select * from services where topic="compute";

+---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+

...
...
created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid |

+---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+

...
...
2018-01-11 17:20:34 | 2022-02-25 09:09:17 | NULL | 17 | compute-01.cloud.pd.infn.it | nova-compute | compute | 10250811 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:09:13 | 0 | 40 | 2f56b8cf-1190-4999-af79-6bcee695c653 | 2018-01-11 17:26:39 | 2022-02-25 09:09:49 | NULL | 23 | compute-02.cloud.pd.infn.it | nova-compute | compute | 10439622 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:09:49 | 0 | 40 | fbe37dfd-4a6c-4da1-96e0-407f7f98c4c4 | 2018-01-11 17:27:12 | 2022-02-25 09:10:02 | NULL | 24 | compute-03.cloud.pd.infn.it | nova-compute | compute | 10361295 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:10:02 | 0 | 40 | 3675f324-81dd-445a-b4eb-510726104be3 | 2021-04-06 12:54:42 | 2022-02-25 09:10:02 | NULL | 25 | compute-04.cloud.pd.infn.it | nova-compute | compute | 1790955 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:10:02 | 0 | 40 | e3e7af4d-b25b-410c-983e-8128a5e97219 |

...
4 rows in set (0.00 sec)

Only after manually setting the version field of these entries to '54', nova-conductor and nova-scheduler were able to start

Regards, Massimo

[*] 2022-02-25 15:06:03.992 591600 CRITICAL nova [req-cc20f294-cced-434b-98cd-5bdf228a2a22 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova ve rsion does not support computes older than Wallaby but the minimum compute service level in your system is 40 and the oldest supported service level is 54. 2022-02-25 15:06:03.992 591600 ERROR nova Traceback (most recent call last): 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/bin/nova-conductor", line 10, in <module> 2022-02-25 15:06:03.992 591600 ERROR nova sys.exit(main()) 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/conductor.py", line 46, in

...
2022-02-25 15:06:03.992 591600 ERROR nova topic=rpcapi.RPC_TOPIC) 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 264, in create 2022-02-25 15:06:03.992 591600 ERROR nova utils.raise_if_old_compute() 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/utils.py", line 1098, in raise_if_old_compute 2022-02-25 15:06:03.992 591600 ERROR nova oldest_supported_service=oldest_supported_service_level) 2022-02-25 15:06:03.992 591600 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Wallaby but the minimum comput e service level in your system is 40 and the oldest supported service level is 54. 2022-02-25 15:06:03.992 591600 ERROR nova

On Mon, Jan 10, 2022 at 6:00 PM Massimo Sgaravatto < massimo.sgaravatto@gmail.com> wrote:

...
Dear all

When we upgraded our Cloud from Rocky to Train we followed the following procedure:

1) Shutdown of all services on the controller and compute nodes 2) Update from Rocky to Stein of controller (just to do the dbsyncs) 3) Update from Stein to Train of controller 4) Update from Rocky to Train of compute nodes

We are trying to do the same to update from Train to Xena, but now

+---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+ main there

...
...
is a problem because nova services on the controller node refuse to start since they find too old compute nodes (this is indeed a new feature, properly documented in the release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table.

Is it ok, or is there a cleaner way to manage the issue ?

Thanks, Massimo

Balazs Gibizer

28 Feb 28 Feb

2:14 p.m.

On Fri, Feb 25 2022 at 05:57:26 PM +0100, Massimo Sgaravatto <massimo.sgaravatto@gmail.com> wrote:

...

Thanks This "disable_compute_service_check_for_ffu" option is not available in xena, correct ?

Not yet. But now I've proposed the backport of that fix to stable/xena[1] Cheers, gibi [1] https://review.opendev.org/c/openstack/nova/+/831174

...

Cheers, Massimo

On Fri, Feb 25, 2022 at 5:15 PM Sean Mooney <smooney@redhat.com> wrote:

...
On Fri, 2022-02-25 at 16:50 +0100, Massimo Sgaravatto wrote:

...
I had the chance to repeat this test So the scenario is:

1) controller and compute nodes running train 2) all services stopped in compute nodes 3) controller updated: train-->ussuri-->victoria--> wallaby

After that nova conductor and nova scheduler refuses to start [*]

yes nova does not offially support n to n+3 upgrade we started enforcing that a few release ago. there is a workaround config option that we recently added that turns the error into a waring https://docs.openstack.org/nova/latest/configuration/config.html#workarounds... that is one option or you can implement or before you upgrade the contoler you can force-down all the comptue nodes

...
At that moment nova-compute services were not running on the

compute nodes

...
And this was the status on the services table:

mysql> select * from services where topic="compute";

+---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+

...
...
created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid |

+---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+

...
...
2018-01-11 17:20:34 | 2022-02-25 09:09:17 | NULL | 17 | compute-01.cloud.pd.infn.it | nova-compute | compute | 10250811 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:09:13 | 0 | 40 | 2f56b8cf-1190-4999-af79-6bcee695c653 | 2018-01-11 17:26:39 | 2022-02-25 09:09:49 | NULL | 23 | compute-02.cloud.pd.infn.it | nova-compute | compute | 10439622 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:09:49 | 0 | 40 | fbe37dfd-4a6c-4da1-96e0-407f7f98c4c4 | 2018-01-11 17:27:12 | 2022-02-25 09:10:02 | NULL | 24 | compute-03.cloud.pd.infn.it | nova-compute | compute | 10361295 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:10:02 | 0 | 40 | 3675f324-81dd-445a-b4eb-510726104be3 | 2021-04-06 12:54:42 | 2022-02-25 09:10:02 | NULL | 25 | compute-04.cloud.pd.infn.it | nova-compute | compute | 1790955 | 1 | 0 | AUTO: Connection to libvirt lost: 1 | 2022-02-25 09:10:02 | 0 | 40 | e3e7af4d-b25b-410c-983e-8128a5e97219 |

...
4 rows in set (0.00 sec)

Only after manually setting the version field of these entries to '54', nova-conductor and nova-scheduler were able to start

Regards, Massimo

[*] 2022-02-25 15:06:03.992 591600 CRITICAL nova [req-cc20f294-cced-434b-98cd-5bdf228a2a22 - - - - -] Unhandled error: nova.exception.TooOldComputeService: Current Nova ve rsion does not support computes older than Wallaby but the minimum compute service level in your system is 40 and the oldest supported service level is 54. 2022-02-25 15:06:03.992 591600 ERROR nova Traceback (most recent call last): 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/bin/nova-conductor", line 10, in <module> 2022-02-25 15:06:03.992 591600 ERROR nova sys.exit(main()) 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/conductor.py", line 46, in main 2022-02-25 15:06:03.992 591600 ERROR nova topic=rpcapi.RPC_TOPIC) 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 264, in create 2022-02-25 15:06:03.992 591600 ERROR nova utils.raise_if_old_compute() 2022-02-25 15:06:03.992 591600 ERROR nova File "/usr/lib/python3.6/site-packages/nova/utils.py", line 1098, in raise_if_old_compute 2022-02-25 15:06:03.992 591600 ERROR nova oldest_supported_service=oldest_supported_service_level) 2022-02-25 15:06:03.992 591600 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Wallaby but the minimum comput e service level in your system is 40 and the oldest supported service level is 54. 2022-02-25 15:06:03.992 591600 ERROR nova

On Mon, Jan 10, 2022 at 6:00 PM Massimo Sgaravatto < massimo.sgaravatto@gmail.com> wrote:

...
Dear all

When we upgraded our Cloud from Rocky to Train we followed the following procedure:

1) Shutdown of all services on the controller and compute nodes 2) Update from Rocky to Stein of controller (just to do the dbsyncs) 3) Update from Stein to Train of controller 4) Update from Rocky to Train of compute nodes

We are trying to do the same to update from Train to Xena, but now there is a problem because nova services on the controller node refuse to start since

+---------------------+---------------------+------------+----+-----------------------------+--------------+---------+--------------+----------+---------+-------------------------------------+---------------------+-------------+---------+--------------------------------------+ they find too

...
...
old compute nodes (this is indeed a new feature, properly documented in the release notes). As a workaround we had to manually modify the "version" field of the compute nodes in the nova.services table.

Is it ok, or is there a cleaner way to manage the issue ?

Thanks, Massimo

1226

Age (days ago)

1275

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

Balazs Gibizer
Dan Smith
Gorka Eguileor
Massimo Sgaravatto
Sean Mooney

[ops][nova] How to update from N to N+3 (and manage nova services that don't start because they find too old compute nodes...)

tags

participants (5)