[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Satish Patel satish.txt at gmail.com
Wed Apr 12 15:00:49 UTC 2023


Hi Nguyễn,

Oh!! make sense, In your post you did the following that is why i got
confused :)  Let me try it in /etc/kolla/config/global.conf file and run
deploy.

>>>>> nano /etc/kollla/global.conf
>>>>>
>>>>> [oslo_messaging_rabbit]
>>>>> kombu_reconnect_delay=0.5

On Wed, Apr 12, 2023 at 10:45 AM Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com>
wrote:

> Hi.
> Create global.conf in /etc/kolla/config/
>
> On Wed, Apr 12, 2023, 9:42 PM Satish Patel <satish.txt at gmail.com> wrote:
>
>> Hi Matt,
>>
>> How do I set kombu_reconnect_delay=0.5 option?
>>
>> Something like the following in global.yml?
>>
>> kombu_reconnect_delay: 0.5
>>
>> On Wed, Apr 12, 2023 at 4:23 AM Matt Crees <mattc at stackhpc.com> wrote:
>>
>>> Hi all,
>>>
>>> It seems worth noting here that there is a fix ongoing in
>>> oslo.messaging which will resolve the issues with HA failing when one
>>> node is down. See here:
>>> https://review.opendev.org/c/openstack/oslo.messaging/+/866617
>>> In the meantime, we have also found that setting kombu_reconnect_delay
>>> = 0.5 does resolve this issue.
>>>
>>> As for why om_enable_rabbitmq_high_availability is currently
>>> defaulting to false, as Michal said enabling it in stable releases
>>> will impact users. This is because it enables durable queues, and the
>>> migration from transient to durable queues is not a seamless
>>> procedure. It requires that the state of RabbitMQ is reset and that
>>> the OpenStack services which use RabbitMQ are restarted to recreate
>>> the queues.
>>>
>>> I think that there is some merit in changing this default value. But
>>> if we did this, we should either add additional support to automate
>>> the migration from transient to durable queues, or at the very least
>>> provide some decent docs on the manual procedure.
>>>
>>> However, as classic queue mirroring is deprecated in RabbitMQ (to be
>>> removed in RabbitMQ 4.0) we should maybe consider switching to quorum
>>> queues soon. Then it may be beneficial to leave the classic queue
>>> mirroring + durable queues setup as False by default. This is because
>>> the migration between queue types (durable or quorum) can take several
>>> hours on larger deployments. So it might be worth making sure the
>>> default values only require one migration to quorum queues in the
>>> future, rather than two (durable queues now and then quorum queues in
>>> the future).
>>>
>>> We will need to make this switch eventually, but right now RabbitMQ
>>> 4.0 does not even have a set release date, so it's not the most urgent
>>> change.
>>>
>>> Cheers,
>>> Matt
>>>
>>> >Hi Michal,
>>> >
>>> >Feel free to propose change of default in master branch, but I don?t
>>> think we can change the default in stable branches without impacting users.
>>> >
>>> >Best regards,
>>> >Michal
>>> >
>>> >> On 11 Apr 2023, at 15:18, Michal Arbet <michal.arbet at ultimum.io>
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Btw, why we have such option set to false ?
>>> >> Michal Arbet
>>> >> Openstack Engineer
>>> >>
>>> >> Ultimum Technologies a.s.
>>> >> Na Po???? 1047/26, 11000 Praha 1
>>> >> Czech Republic
>>> >>
>>> >> +420 604 228 897 <>
>>> >> michal.arbet at ultimum.io <mailto:michal.arbet at ultimum.io>
>>> >> https://ultimum.io <https://ultimum.io/>
>>> >>
>>> >> LinkedIn <https://www.linkedin.com/company/ultimum-technologies> |
>>> Twitter <https://twitter.com/ultimumtech> | Facebook <
>>> https://www.facebook.com/ultimumtechnologies/timeline>
>>> >>
>>> >>
>>> >> ?t 11. 4. 2023 v 14:48 odes?latel Micha? Nasiadka <
>>> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>> napsal:
>>> >>> Hello,
>>> >>>
>>> >>> RabbitMQ HA has been backported into stable releases, and it?s
>>> documented here:
>>> >>>
>>> https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability
>>> >>>
>>> >>> Best regards,
>>> >>> Michal
>>> >>>
>>> >>> W dniu wt., 11.04.2023 o 13:32 Nguy?n H?u Kh?i <
>>> nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>>
>>> napisa?(a):
>>> >>>> Yes.
>>> >>>> But cluster cannot work properly without it. :(
>>> >>>>
>>> >>>> On Tue, Apr 11, 2023, 6:18 PM Danny Webb <
>>> Danny.Webb at thehutgroup.com <mailto:Danny.Webb at thehutgroup.com>> wrote:
>>> >>>>> This commit explains why they largely removed HA queue durability:
>>> >>>>>
>>> >>>>>
>>> https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180
>>> >>>>> From: Satish Patel <satish.txt at gmail.com <mailto:
>>> satish.txt at gmail.com>>
>>> >>>>> Sent: 09 April 2023 04:16
>>> >>>>> To: Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com <mailto:
>>> nguyenhuukhoinw at gmail.com>>
>>> >>>>> Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org
>>> <mailto:openstack-discuss at lists.openstack.org>>
>>> >>>>> Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of
>>> 3 controller was be down
>>> >>>>>
>>> >>>>>
>>> >>>>> CAUTION: This email originates from outside THG
>>> >>>>>
>>> >>>>> Are you proposing a solution or just raising an issue?
>>> >>>>>
>>> >>>>> I did find it strange that kolla-ansible doesn't support HA queue
>>> by default. That is a disaster because when one of the nodes goes down it
>>> will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have
>>> to add HA policy to make queue HA otherwise you will endup in problem.
>>> >>>>>
>>> >>>>> On Sat, Apr 8, 2023 at 6:40?AM Nguy?n H?u Kh?i <
>>> nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> wrote:
>>> >>>>> Hello everyone.
>>> >>>>>
>>> >>>>> I want to summary for who meets problems with Openstack when
>>> deploy cluster with 3 controller using Kolla Ansible
>>> >>>>>
>>> >>>>> Scenario: 1 of 3 controller is down
>>> >>>>>
>>> >>>>> 1. Login horizon and use API such as nova, cinder will be very slow
>>> >>>>>
>>> >>>>> fix by:
>>> >>>>>
>>> >>>>> nano:
>>> >>>>> kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
>>> >>>>> kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
>>> >>>>> kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
>>> >>>>> kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
>>> >>>>> kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
>>> >>>>>
>>> >>>>> or which service need caches
>>> >>>>>
>>> >>>>> add as below
>>> >>>>>
>>> >>>>> [cache]
>>> >>>>> backend = oslo_cache.memcache_pool
>>> >>>>> enabled = True
>>> >>>>> memcache_servers = {{ kolla_internal_vip_address }}:{{
>>> memcached_port }}
>>> >>>>> memcache_dead_retry = 0.25
>>> >>>>> memcache_socket_timeout = 900
>>> >>>>>
>>> >>>>> https://review.opendev.org/c/openstack/kolla-ansible/+/849487
>>> >>>>>
>>> >>>>> but it is not the end
>>> >>>>>
>>> >>>>> 2. Cannot launch instance or mapping block device(stuck at this
>>> step)
>>> >>>>>
>>> >>>>> nano
>>> kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
>>> >>>>>
>>> >>>>> "policies":[
>>> >>>>> {"vhost": "/", "name": "ha-all", "pattern":
>>> "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition":
>>> {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
>>> >>>>> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all",
>>> "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"},
>>> "priority":0}
>>> >>>>> {% endif %}
>>> >>>>> ]
>>> >>>>>
>>> >>>>> nano /etc/kollla/global.conf
>>> >>>>>
>>> >>>>> [oslo_messaging_rabbit]
>>> >>>>> kombu_reconnect_delay=0.5
>>> >>>>>
>>> >>>>>
>>> >>>>> https://bugs.launchpad.net/oslo.messaging/+bug/1993149
>>> >>>>>
>>> https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
>>> >>>>>
>>> >>>>> I used Xena 13.4 and Yoga 14.8.1.
>>> >>>>>
>>> >>>>> Above bugs are critical, but I see that it was not fixed. I am
>>> just an operator and I want to share what I encountered for new people who
>>> come to Openstack
>>> >>>>>
>>> >>>>>
>>> >>>>> Nguyen Huu Khoi
>>> >>> --
>>> >>> Micha? Nasiadka
>>> >>> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230412/61dcf729/attachment.htm>


More information about the openstack-discuss mailing list