[kolla] How to recover MariaDB if there's a controller failed to boot?
Hi everyone, We met an issue few days ago. When we launch Openstack cluster, we found that there's a controller couldn't boot up caused by hardware failure, And MariaDB lost sync after left controllers booted into system. In this case, Can I just mark down the failed controller hostname in inventory file and do mariadb_recovery? Any risk after do this after failure one back online? Many thanks, Eddie.
On Mon, Nov 16, 2020 at 3:52 AM Eddie Yen <missile0407@gmail.com> wrote:
Hi everyone,
Hi Eddie!
We met an issue few days ago. When we launch Openstack cluster, we found that there's a controller couldn't boot up caused by hardware failure, And MariaDB lost sync after left controllers booted into system. In this case, Can I just mark down the failed controller hostname in inventory file and do mariadb_recovery? Any risk after do this after failure one back online?
It is supposed to be used like you are saying. As long as the alive controllers have all the data you need, it will work fine. They should if they were up before failure. The failed one will either be unable to rejoin the cluster or just follow the others. It depends on its local state. It will not impose its state on others so it is safe to turn it on and then fix it. (All the above assumes standard configuration)
Many thanks, Eddie.
-yoctozepto
Hi yoctozepto, thanks your advice. Perhaps need to do maraidb_recovery again once the failure node back online to prevent brain split issue. But we'll try it if we met the same case again in the future! Thank you very much, Eddie. Radosław Piliszek <radoslaw.piliszek@gmail.com> 於 2020年11月16日 週一 下午4:04寫道:
On Mon, Nov 16, 2020 at 3:52 AM Eddie Yen <missile0407@gmail.com> wrote:
Hi everyone,
Hi Eddie!
We met an issue few days ago. When we launch Openstack cluster, we found that there's a controller couldn't boot up caused by hardware failure, And MariaDB lost sync after left controllers booted into system. In this case, Can I just mark down the failed controller hostname in inventory file and do mariadb_recovery? Any risk after do this after failure one back online?
It is supposed to be used like you are saying. As long as the alive controllers have all the data you need, it will work fine. They should if they were up before failure. The failed one will either be unable to rejoin the cluster or just follow the others. It depends on its local state. It will not impose its state on others so it is safe to turn it on and then fix it.
(All the above assumes standard configuration)
Many thanks, Eddie.
-yoctozepto
On Mon, Nov 16, 2020 at 11:33 AM Eddie Yen <missile0407@gmail.com> wrote:
Hi yoctozepto, thanks your advice.
Perhaps need to do maraidb_recovery again once the failure node back online to prevent brain split issue. But we'll try it if we met the same case again in the future!
I would simply eradicate the container and volume on it and then redeploy. Less hassle, satisfaction guaranteed. -yoctozepto
A simple handy way it 0. stop all mariadb containers 1. choose the last stopped node ( data may be loss if you choose a wrong node, but mostly, it doesn't matter) 2. change the /var/lib/docker/volumes/mariadb/_data/grastate.dat, safe_to_bootstrap: 1 3. change the /etc/kolla/mariadb/galera.cnf add ``` [mysqld] wsrep_new_cluster=1 ``` 4. start the mariadb, and wait for it to become available. 5. start mariadb on other nodes 6. revert configuration on 3) and restart the first mariadb On Tue, Nov 17, 2020 at 3:42 AM Radosław Piliszek < radoslaw.piliszek@gmail.com> wrote:
On Mon, Nov 16, 2020 at 11:33 AM Eddie Yen <missile0407@gmail.com> wrote:
Hi yoctozepto, thanks your advice.
Perhaps need to do maraidb_recovery again once the failure node back
online to prevent brain split issue.
But we'll try it if we met the same case again in the future!
I would simply eradicate the container and volume on it and then redeploy. Less hassle, satisfaction guaranteed.
-yoctozepto
Thank you Jeffery! This is also a useful information for us, too. Jeffrey Zhang <zhang.lei.fly+os-discuss@gmail.com> 於 2020年11月18日 週三 下午2:49寫道:
A simple handy way it
0. stop all mariadb containers 1. choose the last stopped node ( data may be loss if you choose a wrong node, but mostly, it doesn't matter) 2. change the /var/lib/docker/volumes/mariadb/_data/grastate.dat, safe_to_bootstrap: 1 3. change the /etc/kolla/mariadb/galera.cnf add ``` [mysqld] wsrep_new_cluster=1 ``` 4. start the mariadb, and wait for it to become available. 5. start mariadb on other nodes 6. revert configuration on 3) and restart the first mariadb
On Tue, Nov 17, 2020 at 3:42 AM Radosław Piliszek < radoslaw.piliszek@gmail.com> wrote:
On Mon, Nov 16, 2020 at 11:33 AM Eddie Yen <missile0407@gmail.com> wrote:
Hi yoctozepto, thanks your advice.
Perhaps need to do maraidb_recovery again once the failure node back
online to prevent brain split issue.
But we'll try it if we met the same case again in the future!
I would simply eradicate the container and volume on it and then redeploy. Less hassle, satisfaction guaranteed.
-yoctozepto
Just to clarify. The below is what the mariadb_recovery is doing behind the scenes. No magic. :-) -yoctozepto On Wed, Nov 18, 2020 at 9:35 AM Eddie Yen <missile0407@gmail.com> wrote:
Thank you Jeffery! This is also a useful information for us, too.
Jeffrey Zhang <zhang.lei.fly+os-discuss@gmail.com> 於 2020年11月18日 週三 下午2:49寫道:
A simple handy way it
0. stop all mariadb containers 1. choose the last stopped node ( data may be loss if you choose a wrong node, but mostly, it doesn't matter) 2. change the /var/lib/docker/volumes/mariadb/_data/grastate.dat, safe_to_bootstrap: 1 3. change the /etc/kolla/mariadb/galera.cnf add ``` [mysqld] wsrep_new_cluster=1 ``` 4. start the mariadb, and wait for it to become available. 5. start mariadb on other nodes 6. revert configuration on 3) and restart the first mariadb
On Tue, Nov 17, 2020 at 3:42 AM Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
On Mon, Nov 16, 2020 at 11:33 AM Eddie Yen <missile0407@gmail.com> wrote:
Hi yoctozepto, thanks your advice.
Perhaps need to do maraidb_recovery again once the failure node back online to prevent brain split issue. But we'll try it if we met the same case again in the future!
I would simply eradicate the container and volume on it and then redeploy. Less hassle, satisfaction guaranteed.
-yoctozepto
participants (3)
-
Eddie Yen
-
Jeffrey Zhang
-
Radosław Piliszek