<div dir="ltr">Hi Matt,<div><br></div><div>How do I set kombu_reconnect_delay=0.5 option? </div><div><br></div><div>Something like the following in global.yml?</div><div><br></div><div>kombu_reconnect_delay: 0.5 <br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 12, 2023 at 4:23 AM Matt Crees <<a href="mailto:mattc@stackhpc.com">mattc@stackhpc.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>
<br>
It seems worth noting here that there is a fix ongoing in<br>
oslo.messaging which will resolve the issues with HA failing when one<br>
node is down. See here:<br>
<a href="https://review.opendev.org/c/openstack/oslo.messaging/+/866617" rel="noreferrer" target="_blank">https://review.opendev.org/c/openstack/oslo.messaging/+/866617</a><br>
In the meantime, we have also found that setting kombu_reconnect_delay<br>
= 0.5 does resolve this issue.<br>
<br>
As for why om_enable_rabbitmq_high_availability is currently<br>
defaulting to false, as Michal said enabling it in stable releases<br>
will impact users. This is because it enables durable queues, and the<br>
migration from transient to durable queues is not a seamless<br>
procedure. It requires that the state of RabbitMQ is reset and that<br>
the OpenStack services which use RabbitMQ are restarted to recreate<br>
the queues.<br>
<br>
I think that there is some merit in changing this default value. But<br>
if we did this, we should either add additional support to automate<br>
the migration from transient to durable queues, or at the very least<br>
provide some decent docs on the manual procedure.<br>
<br>
However, as classic queue mirroring is deprecated in RabbitMQ (to be<br>
removed in RabbitMQ 4.0) we should maybe consider switching to quorum<br>
queues soon. Then it may be beneficial to leave the classic queue<br>
mirroring + durable queues setup as False by default. This is because<br>
the migration between queue types (durable or quorum) can take several<br>
hours on larger deployments. So it might be worth making sure the<br>
default values only require one migration to quorum queues in the<br>
future, rather than two (durable queues now and then quorum queues in<br>
the future).<br>
<br>
We will need to make this switch eventually, but right now RabbitMQ<br>
4.0 does not even have a set release date, so it's not the most urgent<br>
change.<br>
<br>
Cheers,<br>
Matt<br>
<br>
>Hi Michal,<br>
><br>
>Feel free to propose change of default in master branch, but I don?t think we can change the default in stable branches without impacting users.<br>
><br>
>Best regards,<br>
>Michal<br>
><br>
>> On 11 Apr 2023, at 15:18, Michal Arbet <<a href="mailto:michal.arbet@ultimum.io" target="_blank">michal.arbet@ultimum.io</a>> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> Btw, why we have such option set to false ?<br>
>> Michal Arbet<br>
>> Openstack Engineer<br>
>><br>
>> Ultimum Technologies a.s.<br>
>> Na Po???? 1047/26, 11000 Praha 1<br>
>> Czech Republic<br>
>><br>
>> +420 604 228 897 <><br>
>> <a href="mailto:michal.arbet@ultimum.io" target="_blank">michal.arbet@ultimum.io</a> <mailto:<a href="mailto:michal.arbet@ultimum.io" target="_blank">michal.arbet@ultimum.io</a>><br>
>> <a href="https://ultimum.io" rel="noreferrer" target="_blank">https://ultimum.io</a> <<a href="https://ultimum.io/" rel="noreferrer" target="_blank">https://ultimum.io/</a>><br>
>><br>
>> LinkedIn <<a href="https://www.linkedin.com/company/ultimum-technologies" rel="noreferrer" target="_blank">https://www.linkedin.com/company/ultimum-technologies</a>> | Twitter <<a href="https://twitter.com/ultimumtech" rel="noreferrer" target="_blank">https://twitter.com/ultimumtech</a>> | Facebook <<a href="https://www.facebook.com/ultimumtechnologies/timeline" rel="noreferrer" target="_blank">https://www.facebook.com/ultimumtechnologies/timeline</a>><br>
>><br>
>><br>
>> ?t 11. 4. 2023 v 14:48 odes?latel Micha? Nasiadka <<a href="mailto:mnasiadka@gmail.com" target="_blank">mnasiadka@gmail.com</a> <mailto:<a href="mailto:mnasiadka@gmail.com" target="_blank">mnasiadka@gmail.com</a>>> napsal:<br>
>>> Hello,<br>
>>><br>
>>> RabbitMQ HA has been backported into stable releases, and it?s documented here:<br>
>>> <a href="https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability" rel="noreferrer" target="_blank">https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability</a><br>
>>><br>
>>> Best regards,<br>
>>> Michal<br>
>>><br>
>>> W dniu wt., 11.04.2023 o 13:32 Nguy?n H?u Kh?i <<a href="mailto:nguyenhuukhoinw@gmail.com" target="_blank">nguyenhuukhoinw@gmail.com</a> <mailto:<a href="mailto:nguyenhuukhoinw@gmail.com" target="_blank">nguyenhuukhoinw@gmail.com</a>>> napisa?(a):<br>
>>>> Yes.<br>
>>>> But cluster cannot work properly without it. :(<br>
>>>><br>
>>>> On Tue, Apr 11, 2023, 6:18 PM Danny Webb <<a href="mailto:Danny.Webb@thehutgroup.com" target="_blank">Danny.Webb@thehutgroup.com</a> <mailto:<a href="mailto:Danny.Webb@thehutgroup.com" target="_blank">Danny.Webb@thehutgroup.com</a>>> wrote:<br>
>>>>> This commit explains why they largely removed HA queue durability:<br>
>>>>><br>
>>>>> <a href="https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180" rel="noreferrer" target="_blank">https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180</a><br>
>>>>> From: Satish Patel <<a href="mailto:satish.txt@gmail.com" target="_blank">satish.txt@gmail.com</a> <mailto:<a href="mailto:satish.txt@gmail.com" target="_blank">satish.txt@gmail.com</a>>><br>
>>>>> Sent: 09 April 2023 04:16<br>
>>>>> To: Nguy?n H?u Kh?i <<a href="mailto:nguyenhuukhoinw@gmail.com" target="_blank">nguyenhuukhoinw@gmail.com</a> <mailto:<a href="mailto:nguyenhuukhoinw@gmail.com" target="_blank">nguyenhuukhoinw@gmail.com</a>>><br>
>>>>> Cc: OpenStack Discuss <<a href="mailto:openstack-discuss@lists.openstack.org" target="_blank">openstack-discuss@lists.openstack.org</a> <mailto:<a href="mailto:openstack-discuss@lists.openstack.org" target="_blank">openstack-discuss@lists.openstack.org</a>>><br>
>>>>> Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down<br>
>>>>><br>
>>>>><br>
>>>>> CAUTION: This email originates from outside THG<br>
>>>>><br>
>>>>> Are you proposing a solution or just raising an issue?<br>
>>>>><br>
>>>>> I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.<br>
>>>>><br>
>>>>> On Sat, Apr 8, 2023 at 6:40?AM Nguy?n H?u Kh?i <<a href="mailto:nguyenhuukhoinw@gmail.com" target="_blank">nguyenhuukhoinw@gmail.com</a> <mailto:<a href="mailto:nguyenhuukhoinw@gmail.com" target="_blank">nguyenhuukhoinw@gmail.com</a>>> wrote:<br>
>>>>> Hello everyone.<br>
>>>>><br>
>>>>> I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible<br>
>>>>><br>
>>>>> Scenario: 1 of 3 controller is down<br>
>>>>><br>
>>>>> 1. Login horizon and use API such as nova, cinder will be very slow<br>
>>>>><br>
>>>>> fix by:<br>
>>>>><br>
>>>>> nano:<br>
>>>>> kolla-ansible/ansible/roles/heat/templates/heat.conf.j2<br>
>>>>> kolla-ansible/ansible/roles/nova/templates/nova.conf.j2<br>
>>>>> kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2<br>
>>>>> kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2<br>
>>>>> kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2<br>
>>>>><br>
>>>>> or which service need caches<br>
>>>>><br>
>>>>> add as below<br>
>>>>><br>
>>>>> [cache]<br>
>>>>> backend = oslo_cache.memcache_pool<br>
>>>>> enabled = True<br>
>>>>> memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }}<br>
>>>>> memcache_dead_retry = 0.25<br>
>>>>> memcache_socket_timeout = 900<br>
>>>>><br>
>>>>> <a href="https://review.opendev.org/c/openstack/kolla-ansible/+/849487" rel="noreferrer" target="_blank">https://review.opendev.org/c/openstack/kolla-ansible/+/849487</a><br>
>>>>><br>
>>>>> but it is not the end<br>
>>>>><br>
>>>>> 2. Cannot launch instance or mapping block device(stuck at this step)<br>
>>>>><br>
>>>>> nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2<br>
>>>>><br>
>>>>> "policies":[<br>
>>>>> {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},<br>
>>>>> {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}<br>
>>>>> {% endif %}<br>
>>>>> ]<br>
>>>>><br>
>>>>> nano /etc/kollla/global.conf<br>
>>>>><br>
>>>>> [oslo_messaging_rabbit]<br>
>>>>> kombu_reconnect_delay=0.5<br>
>>>>><br>
>>>>><br>
>>>>> <a href="https://bugs.launchpad.net/oslo.messaging/+bug/1993149" rel="noreferrer" target="_blank">https://bugs.launchpad.net/oslo.messaging/+bug/1993149</a><br>
>>>>> <a href="https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html" rel="noreferrer" target="_blank">https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html</a><br>
>>>>><br>
>>>>> I used Xena 13.4 and Yoga 14.8.1.<br>
>>>>><br>
>>>>> Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack<br>
>>>>><br>
>>>>><br>
>>>>> Nguyen Huu Khoi<br>
>>> --<br>
>>> Micha? Nasiadka<br>
>>> <a href="mailto:mnasiadka@gmail.com" target="_blank">mnasiadka@gmail.com</a> <mailto:<a href="mailto:mnasiadka@gmail.com" target="_blank">mnasiadka@gmail.com</a>><br>
<br>
</blockquote></div>