[kolla] All services stats DOWN after re-launch whole cluster.

Eddie Yen missile0407 at gmail.com
Wed Feb 5 11:33:25 UTC 2020


Today I tried to recovery RabbitMQ back, but still not useful, even delete
everything
about data and configs for RabbitMQ then re-deploy (without destroy).

And I found that the /etc/hosts on every nodes all been flushed, the
hostname
resolve data created by kolla-ansible are gone. Checked and found that the
MAAS
just enabled manage_etc_hosts config in /etc/cloud/cloud.cfg.d/ which
caused
/etc/hosts been reset everytime when boot.

Not sure it was a root cause or not but unfortunately I already reset whole
RabbitMQ
data, so only I can do is destroy and deploy again. Fortunately this
cluster was just
beginning so no VM launch, and no do complex setup yet.

I think the issue may solved, although still need a time to investigate.
Based on this
experience, need to notice about this may going to happen if using MAAS to
deploy
the OS.

-Eddie

Eddie Yen <missile0407 at gmail.com> 於 2020年2月4日 週二 下午9:45寫道:

> Hi Erik,
>
> I'm already checked NIC link and no issue found. Pinging the nodes each
> other on each interfaces is OK.
> And I'm not check docker logs about rabbitmq sbecause it works normally.
> I'll check that out later.
>
> -Eddie
>
> Erik McCormick <emccormick at cirrusseven.com> 於 2020年2月4日 週二 下午9:19寫道:
>
>>>>
>> On Tue, Feb 4, 2020, 7:20 AM Eddie Yen <missile0407 at gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> We have the Kolla Openstack site, which is 3 HCI (Controller+Compute) +
>>> 3 Storage (Ceph OSD)
>>> site without internet. We did the shutdown few days ago since CNY
>>> holidays.
>>>
>>> Today we re-launch whole cluster back. First we met the issue that
>>> MariaDB containers keep
>>> restarting, and we fixed by using mariadb_recovery command.
>>> After that we check the status of each services, and found that all
>>> services shown at
>>> Admin > System > System Information are DOWN. Strange is no MariaDB,
>>> AMQP connection,
>>> or other error found when check the downed service log.
>>>
>>> We tried reboot each servers but the situation still a same. Then we
>>> found the RabbitMQ log not
>>> updating, the last log still stayed at the date we shutdown. Logged in
>>> to RabbitMQ container and
>>> type "rabbitmqctl status" shows connection refused, and tried access its
>>> web manager from
>>> <VIP>:15672 on browser just gave us "503 Service unavailable" message.
>>> Also no port 5672
>>> listening.
>>>
>>
>>
>> Any chance you have a NIC that didn't come up? What is in the log of the
>> container itself? (ie. docker log rabbitmq).
>>
>>
>>> I searched this issue on the internet but only few information about
>>> this. One of solution is delete
>>> some files in mnesia folder, another is remove rabbitmq container and
>>> its volume then re-deploy.
>>> But both are not sure. Does anyone know how to solve it?
>>>
>>>
>>> Many thanks,
>>> Eddie.
>>>
>>
>> -Erik
>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200205/88176b69/attachment.html>


More information about the openstack-discuss mailing list