[kolla] All services stats DOWN after re-launch whole cluster.

Eddie Yen missile0407 at gmail.com
Thu Feb 6 07:57:21 UTC 2020


Hi Dincer,

I'm using Rocky, and seems like this fix didn't merge to stable/rocky.
And also what you wrote about flush host table issue in MAAS deployment.

-Eddie

Dincer Celik <hello at dincercelik.com> 於 2020年2月6日 週四 下午3:13寫道:

> Hi Eddie,
>
> Seems like an issue[1] which has been fixed previously. Could you please
> let me know which version are you using?
>
> -osmanlicilegi
>
> [1] https://bugs.launchpad.net/kolla-ansible/+bug/1837699
>
> On 5 Feb 2020, at 14:33, Eddie Yen <missile0407 at gmail.com> wrote:
>
> Today I tried to recovery RabbitMQ back, but still not useful, even delete
> everything
> about data and configs for RabbitMQ then re-deploy (without destroy).
>
> And I found that the /etc/hosts on every nodes all been flushed, the
> hostname
> resolve data created by kolla-ansible are gone. Checked and found that the
> MAAS
> just enabled manage_etc_hosts config in /etc/cloud/cloud.cfg.d/ which
> caused
> /etc/hosts been reset everytime when boot.
>
> Not sure it was a root cause or not but unfortunately I already reset
> whole RabbitMQ
> data, so only I can do is destroy and deploy again. Fortunately this
> cluster was just
> beginning so no VM launch, and no do complex setup yet.
>
> I think the issue may solved, although still need a time to investigate.
> Based on this
> experience, need to notice about this may going to happen if using MAAS to
> deploy
> the OS.
>
> -Eddie
>
> Eddie Yen <missile0407 at gmail.com> 於 2020年2月4日 週二 下午9:45寫道:
>
>> Hi Erik,
>>
>> I'm already checked NIC link and no issue found. Pinging the nodes each
>> other on each interfaces is OK.
>> And I'm not check docker logs about rabbitmq sbecause it works normally.
>> I'll check that out later.
>>
>> -Eddie
>>
>> Erik McCormick <emccormick at cirrusseven.com> 於 2020年2月4日 週二 下午9:19寫道:
>>
>>>>>>
>>> On Tue, Feb 4, 2020, 7:20 AM Eddie Yen <missile0407 at gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> We have the Kolla Openstack site, which is 3 HCI (Controller+Compute) +
>>>> 3 Storage (Ceph OSD)
>>>> site without internet. We did the shutdown few days ago since CNY
>>>> holidays.
>>>>
>>>> Today we re-launch whole cluster back. First we met the issue that
>>>> MariaDB containers keep
>>>> restarting, and we fixed by using mariadb_recovery command.
>>>> After that we check the status of each services, and found that all
>>>> services shown at
>>>> Admin > System > System Information are DOWN. Strange is no MariaDB,
>>>> AMQP connection,
>>>> or other error found when check the downed service log.
>>>>
>>>> We tried reboot each servers but the situation still a same. Then we
>>>> found the RabbitMQ log not
>>>> updating, the last log still stayed at the date we shutdown. Logged in
>>>> to RabbitMQ container and
>>>> type "rabbitmqctl status" shows connection refused, and tried access
>>>> its web manager from
>>>> <VIP>:15672 on browser just gave us "503 Service unavailable" message.
>>>> Also no port 5672
>>>> listening.
>>>>
>>>
>>>
>>> Any chance you have a NIC that didn't come up? What is in the log of the
>>> container itself? (ie. docker log rabbitmq).
>>>
>>>
>>>> I searched this issue on the internet but only few information about
>>>> this. One of solution is delete
>>>> some files in mnesia folder, another is remove rabbitmq container and
>>>> its volume then re-deploy.
>>>> But both are not sure. Does anyone know how to solve it?
>>>>
>>>>
>>>> Many thanks,
>>>> Eddie.
>>>>
>>>
>>> -Erik
>>>
>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200206/d04e414e/attachment.html>


More information about the openstack-discuss mailing list