[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down

Michał Nasiadka mnasiadka at gmail.com
Tue Apr 11 15:18:01 UTC 2023


Hi Michal, 

Feel free to propose change of default in master branch, but I don’t think we can change the default in stable branches without impacting users.

Best regards,
Michal

> On 11 Apr 2023, at 15:18, Michal Arbet <michal.arbet at ultimum.io> wrote:
> 
> Hi,
> 
> Btw, why we have such option set to false ? 
> Michal Arbet
> Openstack Engineer
> 
> Ultimum Technologies a.s.
> Na Poříčí 1047/26, 11000 Praha 1
> Czech Republic
> 
> +420 604 228 897 <> 
> michal.arbet at ultimum.io <mailto:michal.arbet at ultimum.io>
> https://ultimum.io <https://ultimum.io/>
> 
> LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
> 
> 
> út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>> napsal:
>> Hello,
>> 
>> RabbitMQ HA has been backported into stable releases, and it’s documented here: 
>> https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbitmq.html#high-availability
>> 
>> Best regards,
>> Michal
>> 
>> W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> napisał(a):
>>> Yes.
>>> But cluster cannot work properly without it. :(
>>> 
>>> On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb at thehutgroup.com <mailto:Danny.Webb at thehutgroup.com>> wrote:
>>>> This commit explains why they largely removed HA queue durability:
>>>> 
>>>> https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a9a912588af0a180
>>>> From: Satish Patel <satish.txt at gmail.com <mailto:satish.txt at gmail.com>>
>>>> Sent: 09 April 2023 04:16
>>>> To: Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>>
>>>> Cc: OpenStack Discuss <openstack-discuss at lists.openstack.org <mailto:openstack-discuss at lists.openstack.org>>
>>>> Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
>>>>  
>>>> 
>>>> CAUTION: This email originates from outside THG
>>>> 
>>>> Are you proposing a solution or just raising an issue? 
>>>> 
>>>> I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem. 
>>>> 
>>>> On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <nguyenhuukhoinw at gmail.com <mailto:nguyenhuukhoinw at gmail.com>> wrote:
>>>> Hello everyone.
>>>> 
>>>> I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
>>>> 
>>>> Scenario: 1 of 3 controller is down
>>>> 
>>>> 1. Login horizon and use API such as nova, cinder will be very slow
>>>> 
>>>> fix by:
>>>> 
>>>> nano:
>>>> kolla-ansible/ansible/roles/heat/templates/heat.conf.j2
>>>> kolla-ansible/ansible/roles/nova/templates/nova.conf.j2
>>>> kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2
>>>> kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2
>>>> kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
>>>> 
>>>> or which service need caches
>>>>  
>>>> add as below
>>>> 
>>>> [cache]
>>>> backend = oslo_cache.memcache_pool
>>>> enabled = True
>>>> memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }}
>>>> memcache_dead_retry = 0.25
>>>> memcache_socket_timeout = 900
>>>> 
>>>> https://review.opendev.org/c/openstack/kolla-ansible/+/849487
>>>> 
>>>> but it is not the end
>>>> 
>>>> 2. Cannot launch instance or mapping block device(stuck at this step)
>>>> 
>>>> nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
>>>>   
>>>> "policies":[
>>>>     {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %},
>>>>     {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}
>>>>     {% endif %}
>>>>   ]
>>>> 
>>>> nano /etc/kollla/global.conf
>>>> 
>>>> [oslo_messaging_rabbit]
>>>> kombu_reconnect_delay=0.5
>>>> 
>>>> 
>>>> https://bugs.launchpad.net/oslo.messaging/+bug/1993149
>>>> https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
>>>> 
>>>> I used Xena 13.4 and Yoga 14.8.1.
>>>> 
>>>> Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
>>>> 
>>>> 
>>>> Nguyen Huu Khoi
>> -- 
>> Michał Nasiadka
>> mnasiadka at gmail.com <mailto:mnasiadka at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230411/e63d354a/attachment.htm>


More information about the openstack-discuss mailing list