[neutron]Some floating IPs inaccessible after restart of L3 agent
Hello Everyone, We have openstack Victoria deployed since the beginning of the year with kolla/ansible in docker containers. Everything was running OK, but few weeks ago we noticed issues with networking. Our installation uses Openvswitch networking with DVR non HA routers. Everything is running smoothly until we restart L3 agent. After that, some floating ips of VMs running on the node where L3 agent is running becomes inaccessible. Workaround is to reassign floating IP to affected VM. Every restart affects same floating IPs and VMs. No errors/excpetions found in logs. I was able to find out that after restart there are missing routes for those particular floating IPs in fip- namespace, which causes that proxy arp responses are not working. After floating IP address is reassigned, routes are added by L3 agent and floating IP is working again. Looks like some sort of race condition in L3 agent, but I was not able to identify any possible existing bug. L3 agent is in version 17.0.1.dev44. Is anyone aware of any existing bug which could explain such behavior, or does anyone have idea how to solve the issue? Kamil Madáč Slovensko IT a.s.
Ahoj Kamil, I've just read email on phone quickly, and I remember that I've fixed something similar in Debian Victoria packages. Maybe it's your issue, but can't check right now. Could you check it ? It's fixed in newer versions of neutron. https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1927868 Thanks, Michal Arbet (kevko) Dňa pi 26. 11. 2021, 10:53 Kamil Madáč <kamil.madac@slovenskoit.sk> napísal(a):
Hello Everyone,
We have openstack Victoria deployed since the beginning of the year with kolla/ansible in docker containers. Everything was running OK, but few weeks ago we noticed issues with networking. Our installation uses Openvswitch networking with DVR non HA routers.
Everything is running smoothly until we restart L3 agent. After that, some floating ips of VMs running on the node where L3 agent is running becomes inaccessible. Workaround is to reassign floating IP to affected VM. Every restart affects same floating IPs and VMs.
No errors/excpetions found in logs.
I was able to find out that after restart there are missing routes for those particular floating IPs in fip- namespace, which causes that proxy arp responses are not working. After floating IP address is reassigned, routes are added by L3 agent and floating IP is working again.
Looks like some sort of race condition in L3 agent, but I was not able to identify any possible existing bug.
L3 agent is in version 17.0.1.dev44.
Is anyone aware of any existing bug which could explain such behavior, or does anyone have idea how to solve the issue?
Kamil Madáč *Slovensko IT a.s.*
Ahoj Michal, Thanks for responding and suggestion. During the weekend I upgraded neutron l3 agent to most recent victoria version of kolla container (17.2.2.dev56) and it seems it helped -> No disappearing routes in fip namespace anymore after restart 🙂 I found change set which fixes race condition in l3 agent https://review.opendev.org/c/openstack/neutron/+/803576 from September this year and I think that could be the one which fixes it. ________________________________ From: Michal Arbet <michal.arbet@ultimum.io> Sent: Monday, November 29, 2021 10:20 AM To: Kamil Madáč <kamil.madac@slovenskoit.sk> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: [neutron]Some floating IPs inaccessible after restart of L3 agent Ahoj Kamil, I've just read email on phone quickly, and I remember that I've fixed something similar in Debian Victoria packages. Maybe it's your issue, but can't check right now. Could you check it ? It's fixed in newer versions of neutron. https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1927868 Thanks, Michal Arbet (kevko) Dňa pi 26. 11. 2021, 10:53 Kamil Madáč <kamil.madac@slovenskoit.sk<mailto:kamil.madac@slovenskoit.sk>> napísal(a): Hello Everyone, We have openstack Victoria deployed since the beginning of the year with kolla/ansible in docker containers. Everything was running OK, but few weeks ago we noticed issues with networking. Our installation uses Openvswitch networking with DVR non HA routers. Everything is running smoothly until we restart L3 agent. After that, some floating ips of VMs running on the node where L3 agent is running becomes inaccessible. Workaround is to reassign floating IP to affected VM. Every restart affects same floating IPs and VMs. No errors/excpetions found in logs. I was able to find out that after restart there are missing routes for those particular floating IPs in fip- namespace, which causes that proxy arp responses are not working. After floating IP address is reassigned, routes are added by L3 agent and floating IP is working again. Looks like some sort of race condition in L3 agent, but I was not able to identify any possible existing bug. L3 agent is in version 17.0.1.dev44. Is anyone aware of any existing bug which could explain such behavior, or does anyone have idea how to solve the issue? Kamil Madáč Slovensko IT a.s.
I am glad that it works for you now :) Michal Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>* LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline> po 29. 11. 2021 v 10:35 odesílatel Kamil Madáč <kamil.madac@slovenskoit.sk> napsal:
Ahoj Michal,
Thanks for responding and suggestion. During the weekend I upgraded neutron l3 agent to most recent victoria version of kolla container (17.2.2.dev56) and it seems it helped -> No disappearing routes in fip namespace anymore after restart 🙂
I found change set which fixes race condition in l3 agent https://review.opendev.org/c/openstack/neutron/+/803576 from September this year and I think that could be the one which fixes it.
------------------------------ *From:* Michal Arbet <michal.arbet@ultimum.io> *Sent:* Monday, November 29, 2021 10:20 AM *To:* Kamil Madáč <kamil.madac@slovenskoit.sk> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [neutron]Some floating IPs inaccessible after restart of L3 agent
Ahoj Kamil,
I've just read email on phone quickly, and I remember that I've fixed something similar in Debian Victoria packages. Maybe it's your issue, but can't check right now.
Could you check it ? It's fixed in newer versions of neutron.
https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1927868
Thanks, Michal Arbet (kevko)
Dňa pi 26. 11. 2021, 10:53 Kamil Madáč <kamil.madac@slovenskoit.sk> napísal(a):
Hello Everyone,
We have openstack Victoria deployed since the beginning of the year with kolla/ansible in docker containers. Everything was running OK, but few weeks ago we noticed issues with networking. Our installation uses Openvswitch networking with DVR non HA routers.
Everything is running smoothly until we restart L3 agent. After that, some floating ips of VMs running on the node where L3 agent is running becomes inaccessible. Workaround is to reassign floating IP to affected VM. Every restart affects same floating IPs and VMs.
No errors/excpetions found in logs.
I was able to find out that after restart there are missing routes for those particular floating IPs in fip- namespace, which causes that proxy arp responses are not working. After floating IP address is reassigned, routes are added by L3 agent and floating IP is working again.
Looks like some sort of race condition in L3 agent, but I was not able to identify any possible existing bug.
L3 agent is in version 17.0.1.dev44.
Is anyone aware of any existing bug which could explain such behavior, or does anyone have idea how to solve the issue?
Kamil Madáč *Slovensko IT a.s.*
participants (2)
-
Kamil Madáč
-
Michal Arbet