[openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
Hello everyone. I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible Scenario: 1 of 3 controller is down 1. Login horizon and use API such as nova, cinder will be very slow fix by: nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2 or which service need caches add as below [cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900 https://review.opendev.org/c/openstack/kolla-ansible/+/849487 but it is not the end 2. Cannot launch instance or mapping block device(stuck at this step) nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2 "policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ] nano /etc/kollla/global.conf [oslo_messaging_rabbit] kombu_reconnect_delay=0.5 https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html I used Xena 13.4 and Yoga 14.8.1. Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack Nguyen Huu Khoi
Are you proposing a solution or just raising an issue? I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem. On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
I just summarized what I did after googling. and hope we will fix some patches. Especially [oslo_messaging_rabbit] kombu_reconnect_delay=0.5 Nguyen Huu Khoi On Sun, Apr 9, 2023 at 10:16 AM Satish Patel <satish.txt@gmail.com> wrote:
Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
This commit explains why they largely removed HA queue durability: https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... ________________________________ From: Satish Patel <satish.txt@gmail.com> Sent: 09 April 2023 04:16 To: Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down CAUTION: This email originates from outside THG ________________________________ Are you proposing a solution or just raising an issue? I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem. On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com<mailto:nguyenhuukhoinw@gmail.com>> wrote: Hello everyone. I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible Scenario: 1 of 3 controller is down 1. Login horizon and use API such as nova, cinder will be very slow fix by: nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2 or which service need caches add as below [cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900 https://review.opendev.org/c/openstack/kolla-ansible/+/849487<https://review.opendev.org/c/openstack/kolla-ansible/+/849487> but it is not the end 2. Cannot launch instance or mapping block device(stuck at this step) nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2 "policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ] nano /etc/kollla/global.conf [oslo_messaging_rabbit] kombu_reconnect_delay=0.5 https://bugs.launchpad.net/oslo.messaging/+bug/1993149<https://bugs.launchpad.net/oslo.messaging/+bug/1993149> https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html<https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html> I used Xena 13.4 and Yoga 14.8.1. Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack Nguyen Huu Khoi
Yes. But cluster cannot work properly without it. :( On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com> wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* 09 April 2023 04:16 *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
* CAUTION: This email originates from outside THG * ------------------------------ Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
Hello, RabbitMQ HA has been backported into stable releases, and it’s documented here: https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi... Best regards, Michal W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> napisał(a):
Yes. But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com> wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* 09 April 2023 04:16 *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
* CAUTION: This email originates from outside THG * ------------------------------ Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
-- Michał Nasiadka mnasiadka@gmail.com
Hi, Btw, why we have such option set to false ? Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>* LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline> út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka@gmail.com> napsal:
Hello,
RabbitMQ HA has been backported into stable releases, and it’s documented here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards, Michal
W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> napisał(a):
Yes. But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com> wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* 09 April 2023 04:16 *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
* CAUTION: This email originates from outside THG * ------------------------------ Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
-- Michał Nasiadka mnasiadka@gmail.com
On Tue, 2023-04-11 at 15:18 +0200, Michal Arbet wrote:
Hi,
Btw, why we have such option set to false ? it has a pretty big performance penalty if combined with durable queue and in generall its questionable if it should be used going forward.
there is an argument to be made that ha/mirrored queues and durable queue shoudl be replaced with https://www.rabbitmq.com/quorum-queues.html the other thing to consider is that this needs to eb set per vhost so if two serivce share a vhost it needs to be set to the same value. in general for notificaiton both ha and durable queues shoudl be disabled as notifcation are intented to be fire and forget for rpc calls or casts having relibale deliver is imporant but how you acive that is archittur dependent. meaning using ha queue is not alwasy the corrct default. if you need to scale to many request per secodn you are better off using durable queuse with storge on something like ceph/nfs and an active/backup deployment with one rabbit per openstack service. you might choose to run such a rabbit cluster in a k8s env for example using persitent voluems. in other cases simple ha queues and a shred rabbit is fine for small scale deployments. quorum queue may also make more sense. this is why rabbit is called out in the production arctechture guide https://docs.openstack.org/kolla-ansible/latest/admin/production-architectur... and why there is an option to opt into ha/durable queues since that is often enough for small scale deployments. https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rab...
Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka@gmail.com> napsal:
Hello,
RabbitMQ HA has been backported into stable releases, and it’s documented here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards, Michal
W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> napisał(a):
Yes. But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com> wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* 09 April 2023 04:16 *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
* CAUTION: This email originates from outside THG * ------------------------------ Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
-- Michał Nasiadka mnasiadka@gmail.com
This is what i am doing in my deployment, not sure if this is right or not but it works for me and survives my cluster all reboot. # cat /etc/kolla/config/rabbitmq/definitions.json { "vhosts": [{ "name": "/" }], "users": [{ "name": "openstack", "password": "Password123", "tags": "administrator" }, { "name": "monitoring", "password": "Password321", "tags": "monitoring" } ], "permissions": [{ "user": "openstack", "vhost": "/", "configure": ".*", "write": ".*", "read": ".*" }, { "user": "monitoring", "vhost": "/", "configure": "^$", "write": "^$", "read": ".*" } ], "policies": [{ "vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": { "ha-mode": "all" }, "priority": 0 }] } On Tue, Apr 11, 2023 at 10:04 AM Sean Mooney <smooney@redhat.com> wrote:
On Tue, 2023-04-11 at 15:18 +0200, Michal Arbet wrote:
Hi,
Btw, why we have such option set to false ? it has a pretty big performance penalty if combined with durable queue and in generall its questionable if it should be used going forward.
there is an argument to be made that ha/mirrored queues and durable queue shoudl be replaced with https://www.rabbitmq.com/quorum-queues.html
the other thing to consider is that this needs to eb set per vhost so if two serivce share a vhost it needs to be set to the same value.
in general for notificaiton both ha and durable queues shoudl be disabled as notifcation are intented to be fire and forget for rpc calls or casts having relibale deliver is imporant but how you acive that is archittur dependent. meaning using ha queue is not alwasy the corrct default. if you need to scale to many request per secodn you are better off using durable queuse with storge on something like ceph/nfs and an active/backup deployment with one rabbit per openstack service. you might choose to run such a rabbit cluster in a k8s env for example using persitent voluems.
in other cases simple ha queues and a shred rabbit is fine for small scale deployments. quorum queue may also make more sense.
this is why rabbit is called out in the production arctechture guide
https://docs.openstack.org/kolla-ansible/latest/admin/production-architectur... and why there is an option to opt into ha/durable queues since that is often enough for small scale deployments.
https://docs.openstack.org/kolla-ansible/latest/reference/message-queues/rab...
Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>*
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka@gmail.com> napsal:
Hello,
RabbitMQ HA has been backported into stable releases, and it’s documented here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards, Michal
W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <
nguyenhuukhoinw@gmail.com>
napisał(a):
Yes. But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com
wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a...
------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* 09 April 2023 04:16 *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
* CAUTION: This email originates from outside THG * ------------------------------ Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am
just an
operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
-- Michał Nasiadka mnasiadka@gmail.com
Hi Michal, Feel free to propose change of default in master branch, but I don’t think we can change the default in stable branches without impacting users. Best regards, Michal
On 11 Apr 2023, at 15:18, Michal Arbet <michal.arbet@ultimum.io> wrote:
Hi,
Btw, why we have such option set to false ? Michal Arbet Openstack Engineer
Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic
+420 604 228 897 <> michal.arbet@ultimum.io <mailto:michal.arbet@ultimum.io> https://ultimum.io <https://ultimum.io/>
LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline>
út 11. 4. 2023 v 14:48 odesílatel Michał Nasiadka <mnasiadka@gmail.com <mailto:mnasiadka@gmail.com>> napsal:
Hello,
RabbitMQ HA has been backported into stable releases, and it’s documented here: https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards, Michal
W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com <mailto:nguyenhuukhoinw@gmail.com>> napisał(a):
Yes. But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com <mailto:Danny.Webb@thehutgroup.com>> wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... From: Satish Patel <satish.txt@gmail.com <mailto:satish.txt@gmail.com>> Sent: 09 April 2023 04:16 To: Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com <mailto:nguyenhuukhoinw@gmail.com>> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
CAUTION: This email originates from outside THG
Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com <mailto:nguyenhuukhoinw@gmail.com>> wrote: Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi -- Michał Nasiadka mnasiadka@gmail.com <mailto:mnasiadka@gmail.com>
Thank for your response. I set in global.yml. om_enable_rabbitmq_high_availability: "true" But it wont work so i need add policies for rabbitmq. On Tue, Apr 11, 2023, 7:42 PM Michał Nasiadka <mnasiadka@gmail.com> wrote:
Hello,
RabbitMQ HA has been backported into stable releases, and it’s documented here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards, Michal
W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> napisał(a):
Yes. But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com> wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* 09 April 2023 04:16 *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
* CAUTION: This email originates from outside THG * ------------------------------ Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
-- Michał Nasiadka mnasiadka@gmail.com
The most important what I mean that without above config, our cluster will not work prolerly if 1 of 3 controller is down. I tested with Xena and Yoga by using kolla-ansible. On Tue, Apr 11, 2023, 8:48 PM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
Thank for your response. I set in global.yml. om_enable_rabbitmq_high_availability: "true" But it wont work so i need add policies for rabbitmq.
On Tue, Apr 11, 2023, 7:42 PM Michał Nasiadka <mnasiadka@gmail.com> wrote:
Hello,
RabbitMQ HA has been backported into stable releases, and it’s documented here:
https://docs.openstack.org/kolla-ansible/yoga/reference/message-queues/rabbi...
Best regards, Michal
W dniu wt., 11.04.2023 o 13:32 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> napisał(a):
Yes. But cluster cannot work properly without it. :(
On Tue, Apr 11, 2023, 6:18 PM Danny Webb <Danny.Webb@thehutgroup.com> wrote:
This commit explains why they largely removed HA queue durability:
https://opendev.org/openstack/kolla-ansible/commit/2764844ee2ff9393a4eebd90a... ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* 09 April 2023 04:16 *To:* Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [openstack][sharing][kolla ansible]Problems when 1 of 3 controller was be down
* CAUTION: This email originates from outside THG * ------------------------------ Are you proposing a solution or just raising an issue?
I did find it strange that kolla-ansible doesn't support HA queue by default. That is a disaster because when one of the nodes goes down it will make the whole rabbitMQ unacceptable. Whenever i deploy kolla i have to add HA policy to make queue HA otherwise you will endup in problem.
On Sat, Apr 8, 2023 at 6:40 AM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello everyone.
I want to summary for who meets problems with Openstack when deploy cluster with 3 controller using Kolla Ansible
Scenario: 1 of 3 controller is down
1. Login horizon and use API such as nova, cinder will be very slow
fix by:
nano: kolla-ansible/ansible/roles/heat/templates/heat.conf.j2 kolla-ansible/ansible/roles/nova/templates/nova.conf.j2 kolla-ansible/ansible/roles/keystone/templates/keystone.conf.j2 kolla-ansible/ansible/roles/neutron/templates/neutron.conf.j2 kolla-ansible/ansible/roles/cinder/templates/cinder.conf.j2
or which service need caches
add as below
[cache] backend = oslo_cache.memcache_pool enabled = True memcache_servers = {{ kolla_internal_vip_address }}:{{ memcached_port }} memcache_dead_retry = 0.25 memcache_socket_timeout = 900
https://review.opendev.org/c/openstack/kolla-ansible/+/849487
but it is not the end
2. Cannot launch instance or mapping block device(stuck at this step)
nano kolla-ansible/ansible/roles/rabbitmq/templates/definitions.json.j2
"policies":[ {"vhost": "/", "name": "ha-all", "pattern": "^(?!(amq\.)|(.*_fanout_)|(reply_)).*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0}{% if project_name == 'outward_rabbitmq' %}, {"vhost": "{{ murano_agent_rabbitmq_vhost }}", "name": "ha-all", "pattern": ".*", "apply-to": "all", "definition": {"ha-mode":"all"}, "priority":0} {% endif %} ]
nano /etc/kollla/global.conf
[oslo_messaging_rabbit] kombu_reconnect_delay=0.5
https://bugs.launchpad.net/oslo.messaging/+bug/1993149 https://docs.openstack.org/large-scale/journey/configure/rabbitmq.html
I used Xena 13.4 and Yoga 14.8.1.
Above bugs are critical, but I see that it was not fixed. I am just an operator and I want to share what I encountered for new people who come to Openstack
Nguyen Huu Khoi
-- Michał Nasiadka mnasiadka@gmail.com
participants (6)
-
Danny Webb
-
Michal Arbet
-
Michał Nasiadka
-
Nguyễn Hữu Khôi
-
Satish Patel
-
Sean Mooney