RE: Why the mariadb container always restart ?

28 Aug 2021

      OK, I have resolved it.

Thanks.

-----Original Message-----
From: openstack-discuss-bounces+sz_cuitao=163.com@lists.openstack.org <openstack-discuss-bounces+sz_cuitao=163.com@lists.openstack.org> On Behalf Of Tommy Sway
Sent: Saturday, August 28, 2021 2:59 PM
To: 'Radosław Piliszek' <radoslaw.piliszek@gmail.com>
Cc: 'openstack-discuss' <openstack-discuss@lists.openstack.org>
Subject: RE: Why the mariadb container always restart ?

It looks effect.

But cannot access mariadb from the vip address, should I restart which container ?

TASK [mariadb : Wait for master mariadb] **********************************************************************************************************************************************
skipping: [control02]
skipping: [control03]
FAILED - RETRYING: Wait for master mariadb (10 retries left).
ok: [control01]

TASK [mariadb : Wait for MariaDB service to be ready through VIP] *********************************************************************************************************************
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (6 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (6 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (6 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (5 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (5 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (5 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (4 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (4 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (4 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (3 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (3 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (3 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (2 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (2 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (2 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (1 retries left).
FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (1 retries left).
fatal: [control02]: FAILED! => {"attempts": 6, "changed": false, "cmd": ["docker", "exec", "mariadb", "mysql", "-h", "10.10.10.254", "-P", "3306", "-u", "root", "-pAMDGL9CThcBlIsJZyS6VZKLwqvz0BIGbj5PC00Lf", "-e", "show databases;"], "delta": "0:00:01.409807", "end": "2021-08-28 14:57:14.332713", "msg": "non-zero return code", "rc": 1, "start": "2021-08-28 14:57:12.922906", "stderr": "ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)", "stderr_lines": ["ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)"], "stdout": "", "stdout_lines": []} FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (1 retries left).
fatal: [control01]: FAILED! => {"attempts": 6, "changed": false, "cmd": ["docker", "exec", "mariadb", "mysql", "-h", "10.10.10.254", "-P", "3306", "-u", "root", "-pAMDGL9CThcBlIsJZyS6VZKLwqvz0BIGbj5PC00Lf", "-e", "show databases;"], "delta": "0:00:03.553486", "end": "2021-08-28 14:57:29.130631", "msg": "non-zero return code", "rc": 1, "start": "2021-08-28 14:57:25.577145", "stderr": "ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)", "stderr_lines": ["ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)"], "stdout": "", "stdout_lines": []}
fatal: [control03]: FAILED! => {"attempts": 6, "changed": false, "cmd": ["docker", "exec", "mariadb", "mysql", "-h", "10.10.10.254", "-P", "3306", "-u", "root", "-pAMDGL9CThcBlIsJZyS6VZKLwqvz0BIGbj5PC00Lf", "-e", "show databases;"], "delta": "0:00:03.549868", "end": "2021-08-28 14:57:30.885324", "msg": "non-zero return code", "rc": 1, "start": "2021-08-28 14:57:27.335456", "stderr": "ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)", "stderr_lines": ["ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)"], "stdout": "", "stdout_lines": []}

PLAY RECAP ****************************************************************************************************************************************************************************
control01                  : ok=30   changed=8    unreachable=0    failed=1    skipped=16   rescued=0    ignored=0
control02                  : ok=24   changed=5    unreachable=0    failed=1    skipped=21   rescued=0    ignored=0
control03                  : ok=24   changed=5    unreachable=0    failed=1    skipped=21   rescued=0    ignored=0

Command failed ansible-playbook -i ./multinode -e @/etc/kolla/globals.yml  -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  -e kolla_action=deploy /venv/share/kolla-ansible/ansible/mariadb_recovery.yml

[root@control02 ~]# docker ps
CONTAINER ID   IMAGE                                                                COMMAND                  CREATED         STATUS                           PORTS     NAMES
c2d0521d9833   10.10.10.113:4000/kolla/centos-binary-mariadb-server:wallaby         "dumb-init -- kolla_…"   4 minutes ago   Up 4 minutes                               mariadb
7f4038b89518   10.10.10.113:4000/kolla/centos-binary-horizon:wallaby                "dumb-init --single-…"   2 days ago      Up 10 minutes (healthy)                    horizon
82f8e756b5da   10.10.10.113:4000/kolla/centos-binary-heat-engine:wallaby            "dumb-init --single-…"   2 days ago      Up 10 minutes (healthy)                    heat_engine
124d292e17d9   10.10.10.113:4000/kolla/centos-binary-heat-api-cfn:wallaby           "dumb-init --single-…"   2 days ago      Up 10 minutes (healthy)                    heat_api_cfn
75f6bebed35a   10.10.10.113:4000/kolla/centos-binary-heat-api:wallaby               "dumb-init --single-…"   2 days ago      Up 10 minutes (healthy)                    heat_api
c8fc3fc49fe0   10.10.10.113:4000/kolla/centos-binary-neutron-server:wallaby         "dumb-init --single-…"   2 days ago      Up 10 minutes (unhealthy)                  neutron_server
1ed052094fde   10.10.10.113:4000/kolla/centos-binary-nova-novncproxy:wallaby        "dumb-init --single-…"   2 days ago      Up 10 minutes (healthy)                    nova_novncproxy
b25403743c4b   10.10.10.113:4000/kolla/centos-binary-nova-conductor:wallaby         "dumb-init --single-…"   2 days ago      Up 1 second (health: starting)             nova_conductor
f150ff15e53a   10.10.10.113:4000/kolla/centos-binary-nova-api:wallaby               "dumb-init --single-…"   2 days ago      Up 10 minutes (unhealthy)                  nova_api
c71b1718c4d8   10.10.10.113:4000/kolla/centos-binary-nova-scheduler:wallaby         "dumb-init --single-…"   2 days ago      Up 1 second (health: starting)             nova_scheduler
8a5d43ac62ca   10.10.10.113:4000/kolla/centos-binary-placement-api:wallaby          "dumb-init --single-…"   2 days ago      Up 10 minutes (unhealthy)                  placement_api
f0c142d683bf   10.10.10.113:4000/kolla/centos-binary-keystone:wallaby               "dumb-init --single-…"   2 days ago      Up 10 minutes (healthy)                    keystone
6decd0fb670c   10.10.10.113:4000/kolla/centos-binary-keystone-fernet:wallaby        "dumb-init --single-…"   2 days ago      Up 10 minutes                              keystone_fernet
b2b8b14c114b   10.10.10.113:4000/kolla/centos-binary-keystone-ssh:wallaby           "dumb-init --single-…"   2 days ago      Up 10 minutes (healthy)                    keystone_ssh
f119c52904f9   10.10.10.113:4000/kolla/centos-binary-rabbitmq:wallaby               "dumb-init --single-…"   2 days ago      Up 10 minutes                              rabbitmq
e57493e8a877   10.10.10.113:4000/kolla/centos-binary-memcached:wallaby              "dumb-init --single-…"   2 days ago      Up 10 minutes                              memcached
bcb4f5a0f4a9   10.10.10.113:4000/kolla/centos-binary-mariadb-clustercheck:wallaby   "dumb-init --single-…"   2 days ago      Up 10 minutes                              mariadb_clustercheck
6b7ffe32799c   10.10.10.113:4000/kolla/centos-binary-cron:wallaby                   "dumb-init --single-…"   2 days ago      Up 10 minutes                              cron
ccd2b7c0d212   10.10.10.113:4000/kolla/centos-binary-kolla-toolbox:wallaby          "dumb-init --single-…"   2 days ago      Up 10 minutes                              kolla_toolbox
cf4ec99b9c59   10.10.10.113:4000/kolla/centos-binary-fluentd:wallaby                "dumb-init --single-…"   2 days ago      Up 10 minutes                              fluentd

-----Original Message-----
From: Radosław Piliszek <radoslaw.piliszek@gmail.com>
Sent: Saturday, August 28, 2021 1:06 AM
To: Tommy Sway <sz_cuitao@163.com>
Cc: openstack-discuss <openstack-discuss@lists.openstack.org>
Subject: Re: Why the mariadb container always restart ?

On Fri, Aug 27, 2021 at 6:56 PM Tommy Sway <sz_cuitao@163.com> wrote:
...
Hi:
The system is broken down for the power, and I restart the whole system, but the mariadb container always restart, about one minite restart once.
Ant this is the log :
2021-08-28  0:34:51 0 [Note] WSREP: (a8e9005b,
'tcp://10.10.10.63:4567') turning message relay requesting off
2021-08-28  0:35:03 0 [ERROR] WSREP: failed to open gcomm backend
connection: 110: failed to reach primary view: 110 (Connection timed
out)
at gcomm/src/pc.cpp:connect():160
2021-08-28  0:35:03 0 [ERROR] WSREP: 
gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend
connection: -110 (Connection timed out)
2021-08-28  0:35:03 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1475: 
Failed to open channel 'openstack' at
'gcomm://10.10.10.61:4567,10.10.10.62:4567,10.10.10.63:4567': -110 
(Connection timed out)
2021-08-28  0:35:03 0 [ERROR] WSREP: gcs connect failed: Connection 
timed out
2021-08-28  0:35:03 0 [ERROR] WSREP: 
wsrep::connect(gcomm://10.10.10.61:4567,10.10.10.62:4567,10.10.10.63:4
567) failed: 7
2021-08-28  0:35:03 0 [ERROR] Aborting
210828 00:35:05 mysqld_safe mysqld from pid file 
/var/lib/mysql/mariadb.pid ended
210828 00:35:08 mysqld_safe Starting mysqld daemon with databases from 
/var/lib/mysql/
210828 00:35:08 mysqld_safe WSREP: Running position recovery with --disable-log-error  --pid-file='/var/lib/mysql//control03-recover.pid'
210828 00:35:12 mysqld_safe WSREP: Recovered position
5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:140403
2021-08-28  0:35:12 0 [Note] WSREP: Read nil XID from storage engines, 
skipping position init
2021-08-28  0:35:12 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2021-08-28  0:35:12 0 [Note] /usr/libexec/mysqld (mysqld 10.3.28-MariaDB-log) starting as process 258 ...
2021-08-28  0:35:12 0 [Note] WSREP: wsrep_load(): Galera 3.32(rXXXX) by Codership Oy info@codership.com loaded successfully.
2021-08-28  0:35:12 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-08-28  0:35:12 0 [Note] WSREP: Found saved state: 
5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:-1, safe_to_bootstrap: 1
2021-08-28  0:35:12 0 [Note] WSREP: Passing config to GCS: base_dir = 
/var/lib/mysql/; base_host = 10.10.10.63; base_port = 4567; 
cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; 
evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = 
PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = 
PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; 
evs.send_window = 4; evs.stats_report_period = PT1M; 
evs.suspect_timeout = PT5S; evs.user_send_window = 2; 
evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; 
gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = 
/var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover 
= no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; 
gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; 
gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; 
gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 
0.25; gcs.sync_donor = no; gmcast.listen_addr = 
tcp://10.10.10.63:4567; gmcast.segment = 0; gmc
2021-08-28  0:35:12 0 [Note] WSREP: GCache history reset: 
5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:0 ->
5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:140403
2021-08-28  0:35:12 0 [Note] WSREP: Assign initial position for
certification: 140403, protocol version: -1
2021-08-28  0:35:12 0 [Note] WSREP: wsrep_sst_grab()
2021-08-28  0:35:12 0 [Note] WSREP: Start replication
2021-08-28  0:35:12 0 [Note] WSREP: Setting initial position to
5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:140403
2021-08-28  0:35:12 0 [Note] WSREP: protonet asio version 0
2021-08-28  0:35:12 0 [Note] WSREP: Using CRC-32C for message checksums.
2021-08-28  0:35:12 0 [Note] WSREP: backend: asio
2021-08-28  0:35:12 0 [Note] WSREP: gcomm thread scheduling priority 
set to other:0
2021-08-28  0:35:12 0 [Warning] WSREP: access
file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2021-08-28  0:35:12 0 [Note] WSREP: restore pc from disk failed
2021-08-28  0:35:12 0 [Note] WSREP: GMCast version 0
2021-08-28  0:35:12 0 [Note] WSREP: (c05f5c52,
'tcp://10.10.10.63:4567') listening at tcp://10.10.10.63:4567
2021-08-28  0:35:12 0 [Note] WSREP: (c05f5c52,
'tcp://10.10.10.63:4567') multicast: , ttl: 1
2021-08-28  0:35:12 0 [Note] WSREP: EVS version 0
2021-08-28  0:35:12 0 [Note] WSREP: gcomm: connecting to group 'openstack', peer '10.10.10.61:4567,10.10.10.62:4567,10.10.10.63:4567'
2021-08-28  0:35:12 0 [Note] WSREP: (c05f5c52,
'tcp://10.10.10.63:4567') connection established to b1ae23e3
tcp://10.10.10.62:4567
2021-08-28  0:35:12 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') turning message relay requesting on, nonlive peers:
2021-08-28  0:35:16 0 [Note] WSREP: (c05f5c52,
'tcp://10.10.10.63:4567') turning message relay requesting off
2021-08-28  0:35:16 0 [Note] WSREP: (c05f5c52,
'tcp://10.10.10.63:4567') connection established to c296912b
tcp://10.10.10.61:4567
2021-08-28  0:35:16 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') turning message relay requesting on, nonlive peers:
2021-08-28  0:35:18 0 [Note] WSREP: declaring b1ae23e3 at
tcp://10.10.10.62:4567 stable
2021-08-28  0:35:18 0 [Note] WSREP: declaring c296912b at
tcp://10.10.10.61:4567 stable
2021-08-28  0:35:19 0 [Warning] WSREP: no nodes coming from prim view, 
prim not possible
What’s matter with it ?
After lights-out, you have to run ``kolla-ansible mariadb_recovery`` to safely recover the Galera cluster state.

-yoctozepto