[kolla] Train Centos7 -> Centos8 upgrade fails on masakari
Radosław Piliszek
radoslaw.piliszek at gmail.com
Thu Mar 25 18:16:34 UTC 2021
Hi Albert,
I can assure you this is unrelated to Masakari.
As you have observed, it's the RabbitMQ and Keystone (perhaps due to
MariaDB?) that failed.
Something is abusing the CPU there. What is that process?
-yoctozepto
On Thu, Mar 25, 2021 at 7:09 PM Braden, Albert
<C-Albert.Braden at charter.com> wrote:
>
> I’ve created a heat stack and installed Openstack Train to test the Centos7->8 upgrade following the document here:
>
>
>
> https://docs.openstack.org/kolla-ansible/train/user/centos8.html#migrating-from-centos-7-to-centos-8
>
>
>
> Everything seems to work fine until I try to deploy the first replacement controller into the cluster. I upgrade RMQ, ES and Kibana, then follow the “remove existing controller” process to remove control0, create a Centos 8 VM, bootstrap the cluster, pull containers to the new control0, and everything is still working. Then I type the last command “kolla-ansible -i multinode deploy --limit control”
>
>
>
> The RMQ install works and I see all 3 nodes up in the RMQ admin, but it takes a long time to complete “TASK [service-ks-register : masakari | Creating users] “ and then hangs on “TASK [service-ks-register : masakari | Creating roles]”. At this time the new control0 becomes unreachable and drops out of the RMQ cluster. I can still ping it but console hangs along with new and existing ssh sessions. It appears that the CPU may be maxed out and not allowing interrupts. Eventually I see error “fatal: [control0]: FAILED! => {"msg": "Timeout (12s) waiting for privilege escalation prompt: "}”
>
>
>
> RMQ seems fine on the 2 old controllers; they just don’t see the new control0 active:
>
>
>
> (rabbitmq)[root at chrnc-void-testupgrade-control-2 /]# rabbitmqctl cluster_status
>
> Cluster status of node rabbit at chrnc-void-testupgrade-control-2 ...
>
> [{nodes,[{disc,['rabbit at chrnc-void-testupgrade-control-0',
>
> 'rabbit at chrnc-void-testupgrade-control-0-replace',
>
> 'rabbit at chrnc-void-testupgrade-control-1',
>
> 'rabbit at chrnc-void-testupgrade-control-2']}]},
>
> {running_nodes,['rabbit at chrnc-void-testupgrade-control-1',
>
> 'rabbit at chrnc-void-testupgrade-control-2']},
>
> {cluster_name,<<"rabbit at chrnc-void-testupgrade-control-0.dev.chtrse.com">>},
>
> {partitions,[]},
>
> {alarms,[{'rabbit at chrnc-void-testupgrade-control-1',[]},
>
> {'rabbit at chrnc-void-testupgrade-control-2',[]}]}]
>
>
>
> After this the HAProxy IP is pingable but openstack commands are failing:
>
>
>
> (openstack) [root at chrnc-void-testupgrade-build openstack]# osl
>
> Failed to discover available identity versions when contacting http://172.16.0.100:35357/v3. Attempting to parse version from URL.
>
> Gateway Timeout (HTTP 504)
>
>
>
> After about an hour my open ssh session on the new control0 responded and confirmed that the CPU is maxed out:
>
>
>
> [root at chrnc-void-testupgrade-control-0-replace /]# uptime
>
> 17:41:55 up 1:55, 0 users, load average: 157.87, 299.75, 388.69
>
>
>
> I built new heat stacks and tried it a few times, and it consistently fails on masakari. Do I need to change something in my masakari config before upgrading Train from Centos 7 to Centos 8?
>
>
>
> I apologize for the nonsense below. I have not been able to stop it from being attached to my external emails.
>
>
>
> The contents of this e-mail message and
> any attachments are intended solely for the
> addressee(s) and may contain confidential
> and/or legally privileged information. If you
> are not the intended recipient of this message
> or if this message has been addressed to you
> in error, please immediately alert the sender
> by reply e-mail and then delete this message
> and any attachments. If you are not the
> intended recipient, you are notified that
> any use, dissemination, distribution, copying,
> or storage of this message or any attachment
> is strictly prohibited.
More information about the openstack-discuss
mailing list