[EXTERNAL] Re: [kolla] Train Centos7 -> Centos8 upgrade fails on masakari

Mark Goddard mark at stackhpc.com
Fri Mar 26 08:43:53 UTC 2021


On Thu, 25 Mar 2021 at 21:01, Braden, Albert
<C-Albert.Braden at charter.com> wrote:
>
> After about 2 hours the CPU settles down and then control0 joins the RMQ cluster and the admin display looks normal. The mysql container on control0 is stopped and the elasticsearch container is restarting every 60 seconds.
>
> acf52c003292   kolla/centos-source-mariadb:train-centos8                             "dumb-init -- kolla_…"   5 hours ago   Exited (128) 3 hours ago                  mariadb
> 9bc064cf9b2b   kolla/centos-source-elasticsearch6:train-centos8                      "dumb-init --single-…"   5 hours ago   Restarting (1) 20 seconds ago             elasticsearch
>
> The mariadb container refuses to start:
>
> [root at chrnc-void-testupgrade-control-0-replace keystone]# docker start mariadb
> Error response from daemon: OCI runtime create failed: container with id exists: acf52c003292e4841af15bb3c2894b983e37de5a65fc726ae2db2049f0e6774c: unknown
>
> I see a lot of this in mariadb.log on control0:
>
> 2021-03-25 17:29:16 13436 [Warning] Access denied for user 'haproxy'@'chrnc-void-testupgrade-control-2.dev.chtrse.com' (using password: NO)
> 2021-03-25 17:29:16 13437 [Warning] Access denied for user 'haproxy'@'chrnc-void-testupgrade-control-1.dev.chtrse.com' (using password: NO)
> 2021-03-25 17:29:17 13438 [Warning] Access denied for user 'haproxy'@'chrnc-void-testupgrade-control-0-replace.dev.chtrse.com' (using password: NO)
> 2021-03-25 17:29:18 13441 [Warning] Access denied for user 'haproxy'@'chrnc-void-testupgrade-control-2.dev.chtrse.com' (using password: NO)
> 2021-03-25 17:29:18 13442 [Warning] Access denied for user 'haproxy'@'chrnc-void-testupgrade-control-1.dev.chtrse.com' (using password: NO)
>
> Here's the entire mariadb.log starting when I created the cluster and ending when the container died:
>
> https://paste.ubuntu.com/p/FCn9pB6zV4/
>
> The CPU is no longer consumed, but the times might be a clue:
>
>     60 root      20   0       0      0      0 S   0.0   0.0 104:07.36 kswapd0
>       1 root      20   0  253576   9756   4580 S   0.0   0.1  42:22.65 systemd
>   28670 42472     20   0  180416  14140   5812 S   0.0   0.2  20:00.75 memcached_expor
>   28515 42472     20   0  332512  29968   3408 S   0.0   0.4  12:27.02 mysqld_exporter
>   28436 42472     20   0  251896  21584   6956 S   0.0   0.3  12:21.69 node_exporter
>   14857 root      20   0  980460  45684   6592 S   0.0   0.6   9:55.42 containerd
>   34608 42425     20   0  732436 106800   9012 S   0.0   1.4   9:03.50 httpd
>   34609 42425     20   0  732436 106812   8736 S   0.0   1.4   9:01.90 httpd
>   29123 42472     20   0   49360   9840   1288 S   0.3   0.1   8:00.38 elasticsearch_e
>   15034 root      20   0 2793200  68112      0 S   0.0   0.9   6:36.63 dockerd
>   23161 root      20   0  113120   6056      0 S   0.0   0.1   6:25.05 containerd-shim
>   28592 42472     20   0  112820  16988   6348 S   0.0   0.2   6:00.40 haproxy_exporte
>   57248 42472     20   0  253568  74152      0 S   0.0   1.0   5:47.56 openstack-expor
>   28950 42472     20   0  120256  24664   9456 S   0.0   0.3   5:22.03 alertmanager
>   31847 root      20   0  235156   2760   2196 S   0.0   0.0   5:00.97 bash
>
> I'll build another cluster and watch top during the upgrade to see what is consuming the CPU.

Hi Albert,

I would suggest using the --tags argument to go through the
kolla-ansible deploy on the new controller service by service. You can
check site.yml for the order of the plays.

Mark
>
> -----Original Message-----
> From: Radosław Piliszek <radoslaw.piliszek at gmail.com>
> Sent: Thursday, March 25, 2021 2:17 PM
> To: Braden, Albert <C-Albert.Braden at charter.com>
> Cc: openstack-discuss at lists.openstack.org
> Subject: [EXTERNAL] Re: [kolla] Train Centos7 -> Centos8 upgrade fails on masakari
>
> CAUTION: The e-mail below is from an external source. Please exercise caution before opening attachments, clicking links, or following guidance.
>
> Hi Albert,
>
> I can assure you this is unrelated to Masakari.
> As you have observed, it's the RabbitMQ and Keystone (perhaps due to
> MariaDB?) that failed.
> Something is abusing the CPU there. What is that process?
>
> -yoctozepto
>
> On Thu, Mar 25, 2021 at 7:09 PM Braden, Albert
> <C-Albert.Braden at charter.com> wrote:
> >
> > I’ve created a heat stack and installed Openstack Train to test the Centos7->8 upgrade following the document here:
> >
> >
> >
> > https://docs.openstack.org/kolla-ansible/train/user/centos8.html#migrating-from-centos-7-to-centos-8
> >
> >
> >
> > Everything seems to work fine until I try to deploy the first replacement controller into the cluster. I upgrade RMQ, ES and Kibana, then follow the “remove existing controller” process to remove control0, create a Centos 8 VM, bootstrap the cluster, pull containers to the new control0, and everything is still working. Then I type the last command “kolla-ansible -i multinode deploy --limit control”
> >
> >
> >
> > The RMQ install works and I see all 3 nodes up in the RMQ admin, but it takes a long time to complete “TASK [service-ks-register : masakari | Creating users] “ and then hangs on “TASK [service-ks-register : masakari | Creating roles]”. At this time the new control0 becomes unreachable and drops out of the RMQ cluster. I can still ping it but console hangs along with new and existing ssh sessions. It appears that the CPU may be maxed out and not allowing interrupts. Eventually I see error “fatal: [control0]: FAILED! => {"msg": "Timeout (12s) waiting for privilege escalation prompt: "}”
> >
> >
> >
> > RMQ seems fine on the 2 old controllers; they just don’t see the new control0 active:
> >
> >
> >
> > (rabbitmq)[root at chrnc-void-testupgrade-control-2 /]# rabbitmqctl cluster_status
> >
> > Cluster status of node rabbit at chrnc-void-testupgrade-control-2 ...
> >
> > [{nodes,[{disc,['rabbit at chrnc-void-testupgrade-control-0',
> >
> >                 'rabbit at chrnc-void-testupgrade-control-0-replace',
> >
> >                 'rabbit at chrnc-void-testupgrade-control-1',
> >
> >                 'rabbit at chrnc-void-testupgrade-control-2']}]},
> >
> > {running_nodes,['rabbit at chrnc-void-testupgrade-control-1',
> >
> >                  'rabbit at chrnc-void-testupgrade-control-2']},
> >
> > {cluster_name,<<"rabbit at chrnc-void-testupgrade-control-0.dev.chtrse.com">>},
> >
> > {partitions,[]},
> >
> > {alarms,[{'rabbit at chrnc-void-testupgrade-control-1',[]},
> >
> >           {'rabbit at chrnc-void-testupgrade-control-2',[]}]}]
> >
> >
> >
> > After this the HAProxy IP is pingable but openstack commands are failing:
> >
> >
> >
> > (openstack) [root at chrnc-void-testupgrade-build openstack]# osl
> >
> > Failed to discover available identity versions when contacting http://172.16.0.100:35357/v3. Attempting to parse version from URL.
> >
> > Gateway Timeout (HTTP 504)
> >
> >
> >
> > After about an hour my open ssh session on the new control0 responded and confirmed that the CPU is maxed out:
> >
> >
> >
> > [root at chrnc-void-testupgrade-control-0-replace /]# uptime
> >
> > 17:41:55 up  1:55,  0 users,  load average: 157.87, 299.75, 388.69
> >
> >
> >
> > I built new heat stacks and tried it a few times, and it consistently fails on masakari. Do I need to change something in my masakari config before upgrading Train from Centos 7 to Centos 8?
> >
> >
> >
> > I apologize for the nonsense below. I have not been able to stop it from being attached to my external emails.
> >
> >
> >
> > The contents of this e-mail message and
> > any attachments are intended solely for the
> > addressee(s) and may contain confidential
> > and/or legally privileged information. If you
> > are not the intended recipient of this message
> > or if this message has been addressed to you
> > in error, please immediately alert the sender
> > by reply e-mail and then delete this message
> > and any attachments. If you are not the
> > intended recipient, you are notified that
> > any use, dissemination, distribution, copying,
> > or storage of this message or any attachment
> > is strictly prohibited.
> E-MAIL CONFIDENTIALITY NOTICE:
> The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited.



More information about the openstack-discuss mailing list