I’ve created a heat stack and installed Openstack Train to test the Centos7->8 upgrade following the document here:
https://docs.openstack.org/kolla-ansible/train/user/centos8.html#migrating-from-centos-7-to-centos-8
Everything seems to work fine until I try to deploy the first replacement controller into the cluster. I upgrade RMQ, ES and Kibana, then follow the “remove existing controller” process to remove control0, create a Centos 8 VM, bootstrap
the cluster, pull containers to the new control0, and everything is still working. Then I type the last command “kolla-ansible -i multinode deploy --limit control”
The RMQ install works and I see all 3 nodes up in the RMQ admin, but it takes a long time to complete “TASK [service-ks-register : masakari | Creating users] “ and then hangs on “TASK [service-ks-register : masakari | Creating roles]”.
At this time the new control0 becomes unreachable and drops out of the RMQ cluster. I can still ping it but console hangs along with new and existing ssh sessions. It appears that the CPU may be maxed out and not allowing interrupts. Eventually I see error
“fatal: [control0]: FAILED! => {"msg": "Timeout (12s) waiting for privilege escalation prompt: "}”
RMQ seems fine on the 2 old controllers; they just don’t see the new control0 active:
(rabbitmq)[root@chrnc-void-testupgrade-control-2 /]# rabbitmqctl cluster_status
Cluster status of node rabbit@chrnc-void-testupgrade-control-2 ...
[{nodes,[{disc,['rabbit@chrnc-void-testupgrade-control-0',
'rabbit@chrnc-void-testupgrade-control-0-replace',
'rabbit@chrnc-void-testupgrade-control-1',
'rabbit@chrnc-void-testupgrade-control-2']}]},
{running_nodes,['rabbit@chrnc-void-testupgrade-control-1',
'rabbit@chrnc-void-testupgrade-control-2']},
{cluster_name,<<"rabbit@chrnc-void-testupgrade-control-0.dev.chtrse.com">>},
{partitions,[]},
{alarms,[{'rabbit@chrnc-void-testupgrade-control-1',[]},
{'rabbit@chrnc-void-testupgrade-control-2',[]}]}]
After this the HAProxy IP is pingable but openstack commands are failing:
(openstack) [root@chrnc-void-testupgrade-build openstack]# osl
Failed to discover available identity versions when contacting http://172.16.0.100:35357/v3. Attempting to parse version from URL.
Gateway Timeout (HTTP 504)
After about an hour my open ssh session on the new control0 responded and confirmed that the CPU is maxed out:
[root@chrnc-void-testupgrade-control-0-replace /]# uptime
17:41:55 up 1:55, 0 users, load average: 157.87, 299.75, 388.69
I built new heat stacks and tried it a few times, and it consistently fails on masakari. Do I need to change something in my masakari config before upgrading Train from Centos 7 to Centos 8?
I apologize for the nonsense below. I have not been able to stop it from being attached to my external emails.