[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Nguyễn Hữu Khôi nguyenhuukhoinw at gmail.com
Wed Apr 12 14:44:57 UTC 2023


Hi.
Create global.conf in /etc/kolla/config/

On Wed, Apr 12, 2023, 9:42 PM Satish Patel <satish.txt at gmail.com> wrote:

> Hi Matt,
>
> How do I set kombu_reconnect_delay=0.5 option?
>
> Something like the following in global.yml?
>
> kombu_reconnect_delay: 0.5
>
> On Wed, Apr 12, 2023 at 4:23 AM Matt Crees <mattc at stackhpc.com> wrote:
>
>> Hi all,
>>
>> It seems worth noting here that there is a fix ongoing in
>> oslo.messaging which will resolve the issues with HA failing when one
>> node is down. See here:
>> https://review.opendev.org/c/openstack/oslo.messaging/+/866617
>> In the meantime, we have also found that setting kombu_reconnect_delay
>> = 0.5 does resolve this issue.
>>
>> As for why om_enable_rabbitmq_high_availability is currently
>> defaulting to false, as Michal said enabling it in stable releases
>> will impact users. This is because it enables durable queues, and the
>> migration from transient to durable queues is not a seamless
>> procedure. It requires that the state of RabbitMQ is reset and that
>> the OpenStack services which use RabbitMQ are restarted to recreate
>> the queues.
>>
>> I think that there is some merit in changing this default value. But
>> if we did this, we should either add additional support to automate
>> the migration from transient to durable queues, or at the very least
>> provide some decent docs on the manual procedure.
>>
>> However, as classic queue mirroring is deprecated in RabbitMQ (to be
>> removed in RabbitMQ 4.0) we should maybe consider switching to quorum
>> queues soon. Then it may be beneficial to leave the classic queue
>> mirroring + durable queues setup as False by default. This is because
>> the migration between queue types (durable or quorum) can take several
>> hours on larger deployments. So it might be worth making sure the
>> default values only require one migration to quorum queues in the
>> future, rather than two (durable queues now and then quorum queues in
>> the future).
>>
>> We will need to make this switch eventually, but right now RabbitMQ
>> 4.0 does not even have a set release date, so it's not the most urgent
>> change.
>>
>> Cheers,
>> Matt
>>
>> >Hi Michal,
>> >
>> >Feel free to propose change of default in master branch, but I don?t
>> think we can change the default in stable branches without impacting users.
>> >
>> >Best regards,
>> >Michal
>> >
>> >> On 11 Apr 2023, at 15:18, Michal Arbet <michal.arbet at ultimum.io>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Btw, why we have such option set to false ?
>> >> Michal Arbet
>> >> Openstack Engineer
>> >>
>> >> Ultimum Technologies a.s.
>> >> Na Po???? 1047/26, 11000 Praha 1
>> >> Czech Republic
>> >>
>> >> +420 604 228 897 <>
>> >> michal.arbet at ultimum.io <mailto:michal.arbet at ultimum.io>
>> >> https://ultimum.io <https://ultimum.io/>
>> >>
>> >> LinkedIn <https://www.linkedin.com/company/ultimum-technologies> |
>> Twitter <https://twitter.com/ultimumtech> | Facebook <
>> https://www.facebook.com/ultimumtechnologies/timeline>
>> >>
>> >>
>> >> ?t 11. 4. 2023 v 14:48 odes?latel Micha? Nasiadka <mnasiadka at gmail.com
>> <mailto:mnasiadka at gmail.com>> napsal:
>> >>> Hello,
>> >>>
>> >>> RabbitMQ HA has been backported into stable releases, and it?s
>> documented here:
>> >>>
>> https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability
>> >>>
>> >>> Best regards,
>> >>> Michal
>> >>>
>> >>> W dniu wt., 11.04.2023 o 13:32 Nguy?n H?u Kh?i <
>> nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> napisa?(a):
>> >>>> Yes.
>> >>>> But cluster cannot work properly without it. :(
>> >>>>
>> >>>> On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb at thehutgroup.com
>> <mailto:Danny.Webb at thehutgroup.com>> wrote:
>> >>>>> This commit explains why they largely removed HA queue durability:
>> >>>>>
>> >>>>>
>> https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180
>> >>>>> From: Satish Patel <satish.txt at gmail.com <mailto:
>> satish.txt at gmail.com>>
>> >>>>> Sent: 09 April 2023 04:16
>> >>>>> To: Nguy?n H?u Kh?i <nguyenhuukhoinw at gmail.com <mailto:
>> nguyenhuukhoinw at gmail.com>>
>> >>>>> Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org
>> <mailto:openstack-discuss at lists.openstack.org>>
>> >>>>> Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of
>> 3 controller was be down
>> >>>>>
>> >>>>>
>> >>>>> CAUTION: This email originates from outside THG
>> >>>>>
>> >>>>> Are you proposing a solution or just raising an issue?
>> >>>>>
>> >>>>> I did find it strange that kolla-ansible doesn't support HA queue
>> by default. That is a disaster because when one of the nodes goes down it
>> will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have
>> to add HA policy to make queue HA otherwise you will endup in problem.
>> >>>>>
>> >>>>> On Sat, Apr 8, 2023 at 6:40?AM Nguy?n H?u Kh?i <
>> nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> wrote:
>> >>>>> Hello everyone.
>> >>>>>
>> >>>>> I want to summary for who meets problems with Openstack when deploy
>> cluster with 3 controller using Kolla Ansible
>> >>>>>
>> >>>>> Scenario: 1 of 3 controller is down
>> >>>>>
>> >>>>> 1. Login horizon and use API such as nova, cinder will be very slow
>> >>>>>
>> >>>>> fix by:
>> >>>>>
>> >>>>> nano:
>> >>>>> kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
>> >>>>> kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
>> >>>>> kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
>> >>>>> kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
>> >>>>> kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
>> >>>>>
>> >>>>> or which service need caches
>> >>>>>
>> >>>>> add as below
>> >>>>>
>> >>>>> [cache]
>> >>>>> backend = oslo_cache.memcache_pool
>> >>>>> enabled = True
>> >>>>> memcache_servers = {{ kolla_internal_vip_address }}:{{
>> memcached_port }}
>> >>>>> memcache_dead_retry = 0.25
>> >>>>> memcache_socket_timeout = 900
>> >>>>>
>> >>>>> https://review.opendev.org/c/openstack/kolla-ansible/+/849487
>> >>>>>
>> >>>>> but it is not the end
>> >>>>>
>> >>>>> 2. Cannot launch instance or mapping block device(stuck at this
>> step)
>> >>>>>
>> >>>>> nano
>> kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
>> >>>>>
>> >>>>> "policies":[
>> >>>>> {"vhost": "/", "name": "ha-all", "pattern":
>> "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition":
>> {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
>> >>>>> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all",
>> "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"},
>> "priority":0}
>> >>>>> {% endif %}
>> >>>>> ]
>> >>>>>
>> >>>>> nano /etc/kollla/global.conf
>> >>>>>
>> >>>>> [oslo_messaging_rabbit]
>> >>>>> kombu_reconnect_delay=0.5
>> >>>>>
>> >>>>>
>> >>>>> https://bugs.launchpad.net/oslo.messaging/+bug/1993149
>> >>>>>
>> https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
>> >>>>>
>> >>>>> I used Xena 13.4 and Yoga 14.8.1.
>> >>>>>
>> >>>>> Above bugs are critical, but I see that it was not fixed. I am just
>> an operator and I want to share what I encountered for new people who come
>> to Openstack
>> >>>>>
>> >>>>>
>> >>>>> Nguyen Huu Khoi
>> >>> --
>> >>> Micha? Nasiadka
>> >>> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230412/67a20dd9/attachment.htm>


More information about the openstack-discuss mailing list