[kolla] All services stats DOWN after re-launch whole cluster.

Dincer Celik hello at dincercelik.com
Thu Feb 6 07:13:10 UTC 2020

Hi Eddie,

Seems like an issue[1] which has been fixed previously. Could you please let me know which version are you using?


[1] https://bugs.launchpad.net/kolla-ansible/+bug/1837699 <https://bugs.launchpad.net/kolla-ansible/+bug/1837699>

> On 5 Feb 2020, at 14:33, Eddie Yen <missile0407 at gmail.com> wrote:
> Today I tried to recovery RabbitMQ back, but still not useful, even delete everything
> about data and configs for RabbitMQ then re-deploy (without destroy).
> And I found that the /etc/hosts on every nodes all been flushed, the hostname
> resolve data created by kolla-ansible are gone. Checked and found that the MAAS
> just enabled manage_etc_hosts config in /etc/cloud/cloud.cfg.d/ which caused 
> /etc/hosts been reset everytime when boot.
> Not sure it was a root cause or not but unfortunately I already reset whole RabbitMQ
> data, so only I can do is destroy and deploy again. Fortunately this cluster was just
> beginning so no VM launch, and no do complex setup yet.
> I think the issue may solved, although still need a time to investigate. Based on this
> experience, need to notice about this may going to happen if using MAAS to deploy
> the OS.
> -Eddie
> Eddie Yen <missile0407 at gmail.com <mailto:missile0407 at gmail.com>> 於 2020年2月4日 週二 下午9:45寫道:
> Hi Erik,
> I'm already checked NIC link and no issue found. Pinging the nodes each other on each interfaces is OK.
> And I'm not check docker logs about rabbitmq sbecause it works normally. I'll check that out later.
> -Eddie
> Erik McCormick <emccormick at cirrusseven.com <mailto:emccormick at cirrusseven.com>> 於 2020年2月4日 週二 下午9:19寫道:
> On Tue, Feb 4, 2020, 7:20 AM Eddie Yen <missile0407 at gmail.com <mailto:missile0407 at gmail.com>> wrote:
> Hi everyone,
> We have the Kolla Openstack site, which is 3 HCI (Controller+Compute) + 3 Storage (Ceph OSD)
> site without internet. We did the shutdown few days ago since CNY holidays. 
> Today we re-launch whole cluster back. First we met the issue that MariaDB containers keep
> restarting, and we fixed by using mariadb_recovery command.
> After that we check the status of each services, and found that all services shown at
> Admin > System > System Information are DOWN. Strange is no MariaDB, AMQP connection,
> or other error found when check the downed service log.
> We tried reboot each servers but the situation still a same. Then we found the RabbitMQ log not
> updating, the last log still stayed at the date we shutdown. Logged in to RabbitMQ container and
> type "rabbitmqctl status" shows connection refused, and tried access its web manager from 
> <VIP>:15672 on browser just gave us "503 Service unavailable" message. Also no port 5672
> listening.
> Any chance you have a NIC that didn't come up? What is in the log of the container itself? (ie. docker log rabbitmq). 
> I searched this issue on the internet but only few information about this. One of solution is delete
> some files in mnesia folder, another is remove rabbitmq container and its volume then re-deploy.
> But both are not sure. Does anyone know how to solve it?
> Many thanks,
> Eddie.
> -Erik

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200206/e12fa066/attachment-0001.html>

More information about the openstack-discuss mailing list