[kolla] Train Centos7 -> Centos8 upgrade fails on masakari

Braden, Albert C-Albert.Braden at charter.com
Thu Mar 25 18:07:33 UTC 2021


I've created a heat stack and installed Openstack Train to test the Centos7->8 upgrade following the document here:

https://docs.openstack.org/kolla-ansible/train/user/centos8.html#migrating-from-centos-7-to-centos-8

Everything seems to work fine until I try to deploy the first replacement controller into the cluster. I upgrade RMQ, ES and Kibana, then follow the "remove existing controller" process to remove control0, create a Centos 8 VM, bootstrap the cluster, pull containers to the new control0, and everything is still working. Then I type the last command "kolla-ansible -i multinode deploy --limit control"

The RMQ install works and I see all 3 nodes up in the RMQ admin, but it takes a long time to complete "TASK [service-ks-register : masakari | Creating users] " and then hangs on "TASK [service-ks-register : masakari | Creating roles]". At this time the new control0 becomes unreachable and drops out of the RMQ cluster. I can still ping it but console hangs along with new and existing ssh sessions. It appears that the CPU may be maxed out and not allowing interrupts. Eventually I see error "fatal: [control0]: FAILED! => {"msg": "Timeout (12s) waiting for privilege escalation prompt: "}"

RMQ seems fine on the 2 old controllers; they just don't see the new control0 active:

(rabbitmq)[root at chrnc-void-testupgrade-control-2 /]# rabbitmqctl cluster_status
Cluster status of node rabbit at chrnc-void-testupgrade-control-2 ...
[{nodes,[{disc,['rabbit at chrnc-void-testupgrade-control-0',
                'rabbit at chrnc-void-testupgrade-control-0-replace',
                'rabbit at chrnc-void-testupgrade-control-1',
                'rabbit at chrnc-void-testupgrade-control-2']}]},
{running_nodes,['rabbit at chrnc-void-testupgrade-control-1',
                 'rabbit at chrnc-void-testupgrade-control-2']},
{cluster_name,<<"rabbit at chrnc-void-testupgrade-control-0.dev.chtrse.com">>},
{partitions,[]},
{alarms,[{'rabbit at chrnc-void-testupgrade-control-1',[]},
          {'rabbit at chrnc-void-testupgrade-control-2',[]}]}]

After this the HAProxy IP is pingable but openstack commands are failing:

(openstack) [root at chrnc-void-testupgrade-build openstack]# osl
Failed to discover available identity versions when contacting http://172.16.0.100:35357/v3. Attempting to parse version from URL.
Gateway Timeout (HTTP 504)

After about an hour my open ssh session on the new control0 responded and confirmed that the CPU is maxed out:

[root at chrnc-void-testupgrade-control-0-replace /]# uptime
17:41:55 up  1:55,  0 users,  load average: 157.87, 299.75, 388.69

I built new heat stacks and tried it a few times, and it consistently fails on masakari. Do I need to change something in my masakari config before upgrading Train from Centos 7 to Centos 8?

I apologize for the nonsense below. I have not been able to stop it from being attached to my external emails.

E-MAIL CONFIDENTIALITY NOTICE: 
The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210325/fb54ea10/attachment.html>


More information about the openstack-discuss mailing list