[openstack][kolla-ansible][neutron] router ha was split brain when restarting l3 agent docker
Hello., I deploy a multi node cluster with 2 network nodes separately with l3_ha=true and max_l3_agents_per_router=2. It is ok but when I restart neutron_l3_agent on a host which has an active router then 2 routers become standby-standby or active-active. I feel 2 network nodes very easily to split brain. Could I have some ideas on this problem. Thank you. openstack 2025.1-master kolla-ansible2025.1 master ubuntu 22.04 Nguyen Huu Khoi
I see this bug https://review.opendev.org/c/openstack/kolla-ansible/+/481969 but /usr/local/bin/neutron-l3-agent-wrapper.sh then: This program is using eventlet and has been monkey_patched Nguyen Huu Khoi On Sat, Apr 12, 2025 at 9:02 PM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
Hello.,
I deploy a multi node cluster with 2 network nodes separately with l3_ha=true and max_l3_agents_per_router=2. It is ok but when I restart neutron_l3_agent on a host which has an active router then 2 routers become standby-standby or active-active.
I feel 2 network nodes very easily to split brain. Could I have some ideas on this problem. Thank you.
openstack 2025.1-master kolla-ansible2025.1 master ubuntu 22.04
Nguyen Huu Khoi
It is weird because I log in to neutron_l3_agent and see that /var/lib/neutron/kolla/ha_confs/431e2992-d4c5-42b9-96b3-1c932b6b750f/state have right state on both side but neutron consider it is active-active or standby-standby. Nguyen Huu Khoi On Sat, Apr 12, 2025 at 9:29 PM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
I see this bug https://review.opendev.org/c/openstack/kolla-ansible/+/481969 but
/usr/local/bin/neutron-l3-agent-wrapper.sh then:
This program is using eventlet and has been monkey_patched
Nguyen Huu Khoi
On Sat, Apr 12, 2025 at 9:02 PM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
Hello.,
I deploy a multi node cluster with 2 network nodes separately with l3_ha=true and max_l3_agents_per_router=2. It is ok but when I restart neutron_l3_agent on a host which has an active router then 2 routers become standby-standby or active-active.
I feel 2 network nodes very easily to split brain. Could I have some ideas on this problem. Thank you.
openstack 2025.1-master kolla-ansible2025.1 master ubuntu 22.04
Nguyen Huu Khoi
Hello Do i need report in bug launchpad? Because I see all was fixed but this still exist in 2025.1. On Sun, Apr 13, 2025, 8:59 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
It is weird because I log in to neutron_l3_agent and see that
/var/lib/neutron/kolla/ha_confs/431e2992-d4c5-42b9-96b3-1c932b6b750f/state have right state on both side but neutron consider it is active-active or standby-standby.
Nguyen Huu Khoi
On Sat, Apr 12, 2025 at 9:29 PM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
I see this bug https://review.opendev.org/c/openstack/kolla-ansible/+/481969 but
/usr/local/bin/neutron-l3-agent-wrapper.sh then:
This program is using eventlet and has been monkey_patched
Nguyen Huu Khoi
On Sat, Apr 12, 2025 at 9:02 PM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello.,
I deploy a multi node cluster with 2 network nodes separately with l3_ha=true and max_l3_agents_per_router=2. It is ok but when I restart neutron_l3_agent on a host which has an active router then 2 routers become standby-standby or active-active.
I feel 2 network nodes very easily to split brain. Could I have some ideas on this problem. Thank you.
openstack 2025.1-master kolla-ansible2025.1 master ubuntu 22.04
Nguyen Huu Khoi
Hi, Yes please open a bug report. If you believe that this a Neutron bug, please provide details about your configuration to make it easy to reproduce it with some other deployment tool like devsack. Best wishes Lajos Katona (lajoskatona) Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> ezt írta (időpont: 2025. ápr. 13., V, 18:16):
Hello Do i need report in bug launchpad? Because I see all was fixed but this still exist in 2025.1.
On Sun, Apr 13, 2025, 8:59 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
It is weird because I log in to neutron_l3_agent and see that
/var/lib/neutron/kolla/ha_confs/431e2992-d4c5-42b9-96b3-1c932b6b750f/state have right state on both side but neutron consider it is active-active or standby-standby.
Nguyen Huu Khoi
On Sat, Apr 12, 2025 at 9:29 PM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
I see this bug https://review.opendev.org/c/openstack/kolla-ansible/+/481969 but
/usr/local/bin/neutron-l3-agent-wrapper.sh then:
This program is using eventlet and has been monkey_patched
Nguyen Huu Khoi
On Sat, Apr 12, 2025 at 9:02 PM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello.,
I deploy a multi node cluster with 2 network nodes separately with l3_ha=true and max_l3_agents_per_router=2. It is ok but when I restart neutron_l3_agent on a host which has an active router then 2 routers become standby-standby or active-active.
I feel 2 network nodes very easily to split brain. Could I have some ideas on this problem. Thank you.
openstack 2025.1-master kolla-ansible2025.1 master ubuntu 22.04
Nguyen Huu Khoi
Thank you, I will do that. Nguyen Huu Khoi On Mon, Apr 14, 2025 at 2:16 PM Lajos Katona <katonalala@gmail.com> wrote:
Hi, Yes please open a bug report. If you believe that this a Neutron bug, please provide details about your configuration to make it easy to reproduce it with some other deployment tool like devsack.
Best wishes Lajos Katona (lajoskatona)
Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> ezt írta (időpont: 2025. ápr. 13., V, 18:16):
Hello Do i need report in bug launchpad? Because I see all was fixed but this still exist in 2025.1.
On Sun, Apr 13, 2025, 8:59 AM Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com> wrote:
It is weird because I log in to neutron_l3_agent and see that
/var/lib/neutron/kolla/ha_confs/431e2992-d4c5-42b9-96b3-1c932b6b750f/state have right state on both side but neutron consider it is active-active or standby-standby.
Nguyen Huu Khoi
On Sat, Apr 12, 2025 at 9:29 PM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
I see this bug https://review.opendev.org/c/openstack/kolla-ansible/+/481969 but
/usr/local/bin/neutron-l3-agent-wrapper.sh then:
This program is using eventlet and has been monkey_patched
Nguyen Huu Khoi
On Sat, Apr 12, 2025 at 9:02 PM Nguyễn Hữu Khôi < nguyenhuukhoinw@gmail.com> wrote:
Hello.,
I deploy a multi node cluster with 2 network nodes separately with l3_ha=true and max_l3_agents_per_router=2. It is ok but when I restart neutron_l3_agent on a host which has an active router then 2 routers become standby-standby or active-active.
I feel 2 network nodes very easily to split brain. Could I have some ideas on this problem. Thank you.
openstack 2025.1-master kolla-ansible2025.1 master ubuntu 22.04
Nguyen Huu Khoi
participants (2)
-
Lajos Katona
-
Nguyễn Hữu Khôi