OK, I have resolved it. Thanks. -----Original Message----- From: openstack-discuss-bounces+sz_cuitao=163.com@lists.openstack.org <openstack-discuss-bounces+sz_cuitao=163.com@lists.openstack.org> On Behalf Of Tommy Sway Sent: Saturday, August 28, 2021 2:59 PM To: 'Radosław Piliszek' <radoslaw.piliszek@gmail.com> Cc: 'openstack-discuss' <openstack-discuss@lists.openstack.org> Subject: RE: Why the mariadb container always restart ? It looks effect. But cannot access mariadb from the vip address, should I restart which container ? TASK [mariadb : Wait for master mariadb] ********************************************************************************************************************************************** skipping: [control02] skipping: [control03] FAILED - RETRYING: Wait for master mariadb (10 retries left). ok: [control01] TASK [mariadb : Wait for MariaDB service to be ready through VIP] ********************************************************************************************************************* FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (6 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (6 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (6 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (5 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (5 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (5 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (4 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (4 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (4 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (3 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (3 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (3 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (2 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (2 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (2 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (1 retries left). FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (1 retries left). fatal: [control02]: FAILED! => {"attempts": 6, "changed": false, "cmd": ["docker", "exec", "mariadb", "mysql", "-h", "10.10.10.254", "-P", "3306", "-u", "root", "-pAMDGL9CThcBlIsJZyS6VZKLwqvz0BIGbj5PC00Lf", "-e", "show databases;"], "delta": "0:00:01.409807", "end": "2021-08-28 14:57:14.332713", "msg": "non-zero return code", "rc": 1, "start": "2021-08-28 14:57:12.922906", "stderr": "ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)", "stderr_lines": ["ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)"], "stdout": "", "stdout_lines": []} FAILED - RETRYING: Wait for MariaDB service to be ready through VIP (1 retries left). fatal: [control01]: FAILED! => {"attempts": 6, "changed": false, "cmd": ["docker", "exec", "mariadb", "mysql", "-h", "10.10.10.254", "-P", "3306", "-u", "root", "-pAMDGL9CThcBlIsJZyS6VZKLwqvz0BIGbj5PC00Lf", "-e", "show databases;"], "delta": "0:00:03.553486", "end": "2021-08-28 14:57:29.130631", "msg": "non-zero return code", "rc": 1, "start": "2021-08-28 14:57:25.577145", "stderr": "ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)", "stderr_lines": ["ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)"], "stdout": "", "stdout_lines": []} fatal: [control03]: FAILED! => {"attempts": 6, "changed": false, "cmd": ["docker", "exec", "mariadb", "mysql", "-h", "10.10.10.254", "-P", "3306", "-u", "root", "-pAMDGL9CThcBlIsJZyS6VZKLwqvz0BIGbj5PC00Lf", "-e", "show databases;"], "delta": "0:00:03.549868", "end": "2021-08-28 14:57:30.885324", "msg": "non-zero return code", "rc": 1, "start": "2021-08-28 14:57:27.335456", "stderr": "ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)", "stderr_lines": ["ERROR 2002 (HY000): Can't connect to MySQL server on '10.10.10.254' (115)"], "stdout": "", "stdout_lines": []} PLAY RECAP **************************************************************************************************************************************************************************** control01 : ok=30 changed=8 unreachable=0 failed=1 skipped=16 rescued=0 ignored=0 control02 : ok=24 changed=5 unreachable=0 failed=1 skipped=21 rescued=0 ignored=0 control03 : ok=24 changed=5 unreachable=0 failed=1 skipped=21 rescued=0 ignored=0 Command failed ansible-playbook -i ./multinode -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=deploy /venv/share/kolla-ansible/ansible/mariadb_recovery.yml [root@control02 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c2d0521d9833 10.10.10.113:4000/kolla/centos-binary-mariadb-server:wallaby "dumb-init -- kolla_…" 4 minutes ago Up 4 minutes mariadb 7f4038b89518 10.10.10.113:4000/kolla/centos-binary-horizon:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (healthy) horizon 82f8e756b5da 10.10.10.113:4000/kolla/centos-binary-heat-engine:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (healthy) heat_engine 124d292e17d9 10.10.10.113:4000/kolla/centos-binary-heat-api-cfn:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (healthy) heat_api_cfn 75f6bebed35a 10.10.10.113:4000/kolla/centos-binary-heat-api:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (healthy) heat_api c8fc3fc49fe0 10.10.10.113:4000/kolla/centos-binary-neutron-server:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (unhealthy) neutron_server 1ed052094fde 10.10.10.113:4000/kolla/centos-binary-nova-novncproxy:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (healthy) nova_novncproxy b25403743c4b 10.10.10.113:4000/kolla/centos-binary-nova-conductor:wallaby "dumb-init --single-…" 2 days ago Up 1 second (health: starting) nova_conductor f150ff15e53a 10.10.10.113:4000/kolla/centos-binary-nova-api:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (unhealthy) nova_api c71b1718c4d8 10.10.10.113:4000/kolla/centos-binary-nova-scheduler:wallaby "dumb-init --single-…" 2 days ago Up 1 second (health: starting) nova_scheduler 8a5d43ac62ca 10.10.10.113:4000/kolla/centos-binary-placement-api:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (unhealthy) placement_api f0c142d683bf 10.10.10.113:4000/kolla/centos-binary-keystone:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (healthy) keystone 6decd0fb670c 10.10.10.113:4000/kolla/centos-binary-keystone-fernet:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes keystone_fernet b2b8b14c114b 10.10.10.113:4000/kolla/centos-binary-keystone-ssh:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes (healthy) keystone_ssh f119c52904f9 10.10.10.113:4000/kolla/centos-binary-rabbitmq:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes rabbitmq e57493e8a877 10.10.10.113:4000/kolla/centos-binary-memcached:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes memcached bcb4f5a0f4a9 10.10.10.113:4000/kolla/centos-binary-mariadb-clustercheck:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes mariadb_clustercheck 6b7ffe32799c 10.10.10.113:4000/kolla/centos-binary-cron:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes cron ccd2b7c0d212 10.10.10.113:4000/kolla/centos-binary-kolla-toolbox:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes kolla_toolbox cf4ec99b9c59 10.10.10.113:4000/kolla/centos-binary-fluentd:wallaby "dumb-init --single-…" 2 days ago Up 10 minutes fluentd -----Original Message----- From: Radosław Piliszek <radoslaw.piliszek@gmail.com> Sent: Saturday, August 28, 2021 1:06 AM To: Tommy Sway <sz_cuitao@163.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: Why the mariadb container always restart ? On Fri, Aug 27, 2021 at 6:56 PM Tommy Sway <sz_cuitao@163.com> wrote:
Hi:
The system is broken down for the power, and I restart the whole system, but the mariadb container always restart, about one minite restart once.
Ant this is the log :
2021-08-28 0:34:51 0 [Note] WSREP: (a8e9005b, 'tcp://10.10.10.63:4567') turning message relay requesting off
2021-08-28 0:35:03 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():160
2021-08-28 0:35:03 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2021-08-28 0:35:03 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1475: Failed to open channel 'openstack' at 'gcomm://10.10.10.61:4567,10.10.10.62:4567,10.10.10.63:4567': -110 (Connection timed out)
2021-08-28 0:35:03 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2021-08-28 0:35:03 0 [ERROR] WSREP: wsrep::connect(gcomm://10.10.10.61:4567,10.10.10.62:4567,10.10.10.63:4 567) failed: 7
2021-08-28 0:35:03 0 [ERROR] Aborting
210828 00:35:05 mysqld_safe mysqld from pid file /var/lib/mysql/mariadb.pid ended
210828 00:35:08 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/
210828 00:35:08 mysqld_safe WSREP: Running position recovery with --disable-log-error --pid-file='/var/lib/mysql//control03-recover.pid'
210828 00:35:12 mysqld_safe WSREP: Recovered position 5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:140403
2021-08-28 0:35:12 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2021-08-28 0:35:12 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2021-08-28 0:35:12 0 [Note] /usr/libexec/mysqld (mysqld 10.3.28-MariaDB-log) starting as process 258 ...
2021-08-28 0:35:12 0 [Note] WSREP: wsrep_load(): Galera 3.32(rXXXX) by Codership Oy info@codership.com loaded successfully.
2021-08-28 0:35:12 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-08-28 0:35:12 0 [Note] WSREP: Found saved state: 5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:-1, safe_to_bootstrap: 1
2021-08-28 0:35:12 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 10.10.10.63; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://10.10.10.63:4567; gmcast.segment = 0; gmc
2021-08-28 0:35:12 0 [Note] WSREP: GCache history reset: 5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:0 -> 5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:140403
2021-08-28 0:35:12 0 [Note] WSREP: Assign initial position for certification: 140403, protocol version: -1
2021-08-28 0:35:12 0 [Note] WSREP: wsrep_sst_grab()
2021-08-28 0:35:12 0 [Note] WSREP: Start replication
2021-08-28 0:35:12 0 [Note] WSREP: Setting initial position to 5bdd8d83-05a4-11ec-adfd-ae6e7e26deb9:140403
2021-08-28 0:35:12 0 [Note] WSREP: protonet asio version 0
2021-08-28 0:35:12 0 [Note] WSREP: Using CRC-32C for message checksums.
2021-08-28 0:35:12 0 [Note] WSREP: backend: asio
2021-08-28 0:35:12 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2021-08-28 0:35:12 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2021-08-28 0:35:12 0 [Note] WSREP: restore pc from disk failed
2021-08-28 0:35:12 0 [Note] WSREP: GMCast version 0
2021-08-28 0:35:12 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') listening at tcp://10.10.10.63:4567
2021-08-28 0:35:12 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') multicast: , ttl: 1
2021-08-28 0:35:12 0 [Note] WSREP: EVS version 0
2021-08-28 0:35:12 0 [Note] WSREP: gcomm: connecting to group 'openstack', peer '10.10.10.61:4567,10.10.10.62:4567,10.10.10.63:4567'
2021-08-28 0:35:12 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') connection established to b1ae23e3 tcp://10.10.10.62:4567
2021-08-28 0:35:12 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') turning message relay requesting on, nonlive peers:
2021-08-28 0:35:16 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') turning message relay requesting off
2021-08-28 0:35:16 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') connection established to c296912b tcp://10.10.10.61:4567
2021-08-28 0:35:16 0 [Note] WSREP: (c05f5c52, 'tcp://10.10.10.63:4567') turning message relay requesting on, nonlive peers:
2021-08-28 0:35:18 0 [Note] WSREP: declaring b1ae23e3 at tcp://10.10.10.62:4567 stable
2021-08-28 0:35:18 0 [Note] WSREP: declaring c296912b at tcp://10.10.10.61:4567 stable
2021-08-28 0:35:19 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
What’s matter with it ?
After lights-out, you have to run ``kolla-ansible mariadb_recovery`` to safely recover the Galera cluster state. -yoctozepto