[neutron][octavia] vip port missing virtual-parent in ovn
Greetings, I'm currently debugging a strange issue related to Octavia loadbalancer failovers and was wondering if anyone has encountered something similar or has any insights. We're running Neutron with ML2/OVN and the ovn-bgp-agent. When we create Octavia load balancers using the Amphora driver, they come up fine. After attaching a floating IP, the ovn-bgp-agent correctly announces the FIP. However, during load balancer failovers, we sometimes observe that the FIP is no longer being announced. After some debugging, we found that the ovn-bgp-agent stops announcing the FIP because the VIP port in OVN is down and no longer has a neutron:host_id set in external_ids. Digging deeper, we noticed that the VIP port's options:virtual-parents only includes a single port ID and it's not the one corresponding to the current VRRP master. As a result, OVN doesn't associate the VIP with the active amphora, causing the port to be up=false and leaving the external_ids:neutron:host_id field empty. Because of this, no bgp-agent sees itself as responsible for announcing the FIP. Looking at the Octavia amphora ports in Neutron, all of them have the correct allowed address pairs set. This should normally result in the correct virtual-parents being reflected in OVN. Interestingly, I was able to reproduce this behavior even with the SINGLE topology in Octavia, a failover sometimes removes the entire options fields related to virtual-parents and virtual-ip from the VIP port. Manually unsetting and re-setting the allowed address pair on the affected port always restores the correct virtual-parents, and the announcement resumes. Now the question is how could this happen? It seems completely random, but it's very reproducible in my environment. We're running Neutron 2025.1 (26.0.0) with OVN 24.03.1-5.el9. I haven't found any errors in the Neutron server logs indicating issues writing to OVN, or any relevant tracebacks. Thanks in advance for any input or ideas! Best regards, Max
We have quite similar setup, though pre-2025.1 But from what I can recall, options:virtual-parents should contain both Octavia ports at all times. From what I recall, virtual-parents are identified through allowed address pairs mappings, as you need to add the VIP to be allowed on MAC addresses of vrrp interfaces. There is also a strict requirement of having port security enabled on such ports. So a thing to check would be that your Octavia network has port security enabled by default, and you configure security groups accordingly on amphora instances instead of disabling it. And also check that allowed address pairs are defined for amphora management ports With that I would expect external_ids behavior and port state in Neutron to be not that relevant. On Thu, 3 Jul 2025, 12:24 Maximilian Stinsky-Damke, <Maximilian.Stinsky-Damke@wiit.cloud> wrote:
Greetings,
I'm currently debugging a strange issue related to Octavia loadbalancer failovers and was wondering if anyone has encountered something similar or has any insights.
We're running Neutron with ML2/OVN and the ovn-bgp-agent. When we create Octavia load balancers using the Amphora driver, they come up fine. After attaching a floating IP, the ovn-bgp-agent correctly announces the FIP. However, during load balancer failovers, we sometimes observe that the FIP is no longer being announced. After some debugging, we found that the ovn-bgp-agent stops announcing the FIP because the VIP port in OVN is down and no longer has a neutron:host_id set in external_ids. Digging deeper, we noticed that the VIP port's options:virtual-parents only includes a single port ID and it's not the one corresponding to the current VRRP master. As a result, OVN doesn't associate the VIP with the active amphora, causing the port to be up=false and leaving the external_ids:neutron:host_id field empty. Because of this, no bgp-agent sees itself as responsible for announcing the FIP.
Looking at the Octavia amphora ports in Neutron, all of them have the correct allowed address pairs set. This should normally result in the correct virtual-parents being reflected in OVN.
Interestingly, I was able to reproduce this behavior even with the SINGLE topology in Octavia, a failover sometimes removes the entire options fields related to virtual-parents and virtual-ip from the VIP port. Manually unsetting and re-setting the allowed address pair on the affected port always restores the correct virtual-parents, and the announcement resumes. Now the question is how could this happen? It seems completely random, but it's very reproducible in my environment.
We're running Neutron 2025.1 (26.0.0) with OVN 24.03.1-5.el9. I haven't found any errors in the Neutron server logs indicating issues writing to OVN, or any relevant tracebacks.
Thanks in advance for any input or ideas!
Best regards, Max
Yes, options:virtual-parents should contain both ports. The normal flow during a failover should be that a 3rd amphora vm is created, resulting in 3 virtual-parents for a short duration. After the 3rd amphora is allocated one of the amphoras will be removed so that in ovn we have 2 virtual-parents again. But it seems that neutron sometimes does not add the 3rd amphora vrrp port to the virtual-parents and when the old amphora is removed I end up with a single virtual-parent on my vip port. Like I said, when I unset the allowed address pair on the new amphora vrrp port and then reattach the allowed address pair it will always fix the loadbalancer. Therefore it seems to me that neutron for what ever reason sometimes misses to add the virtual-parent when the amphora vm is rotated. Regarding your statement that external_ids is not that relevant, that is not entirely true when using the ovn-bgp-agent. As the agent requires to know if it is responsible for the vip port it looks inside ovn for vip ports that have the external_ids:neutron:host_id field set to its host. And only then it announces the ip address. But because in some cases I am missing the virtual-parent for potentially the vrrp master, ovn brings down the port and neutron removes the host_id field which results in no ovn-bgp-agent feeling responsible for the vip and the lb beeing down. ________________________________ From: Dmitriy Rabotyagov <noonedeadpunk@gmail.com> Sent: 03 July 2025 12:35 To: Maximilian Stinsky-Damke <Maximilian.Stinsky-Damke@wiit.cloud> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: [neutron][octavia] vip port missing virtual-parent in ovn We have quite similar setup, though pre-2025.1 But from what I can recall, options:virtual-parents should contain both Octavia ports at all times. From what I recall, virtual-parents are identified through allowed address pairs mappings, as you need to add the VIP to be allowed on MAC addresses of vrrp interfaces. There is also a strict requirement of having port security enabled on such ports. So a thing to check would be that your Octavia network has port security enabled by default, and you configure security groups accordingly on amphora instances instead of disabling it. And also check that allowed address pairs are defined for amphora management ports With that I would expect external_ids behavior and port state in Neutron to be not that relevant. On Thu, 3 Jul 2025, 12:24 Maximilian Stinsky-Damke, <Maximilian.Stinsky-Damke@wiit.cloud> wrote: Greetings, I'm currently debugging a strange issue related to Octavia loadbalancer failovers and was wondering if anyone has encountered something similar or has any insights. We're running Neutron with ML2/OVN and the ovn-bgp-agent. When we create Octavia load balancers using the Amphora driver, they come up fine. After attaching a floating IP, the ovn-bgp-agent correctly announces the FIP. However, during load balancer failovers, we sometimes observe that the FIP is no longer being announced. After some debugging, we found that the ovn-bgp-agent stops announcing the FIP because the VIP port in OVN is down and no longer has a neutron:host_id set in external_ids. Digging deeper, we noticed that the VIP port's options:virtual-parents only includes a single port ID and it's not the one corresponding to the current VRRP master. As a result, OVN doesn't associate the VIP with the active amphora, causing the port to be up=false and leaving the external_ids:neutron:host_id field empty. Because of this, no bgp-agent sees itself as responsible for announcing the FIP. Looking at the Octavia amphora ports in Neutron, all of them have the correct allowed address pairs set. This should normally result in the correct virtual-parents being reflected in OVN. Interestingly, I was able to reproduce this behavior even with the SINGLE topology in Octavia, a failover sometimes removes the entire options fields related to virtual-parents and virtual-ip from the VIP port. Manually unsetting and re-setting the allowed address pair on the affected port always restores the correct virtual-parents, and the announcement resumes. Now the question is how could this happen? It seems completely random, but it's very reproducible in my environment. We're running Neutron 2025.1 (26.0.0) with OVN 24.03.1-5.el9. I haven't found any errors in the Neutron server logs indicating issues writing to OVN, or any relevant tracebacks. Thanks in advance for any input or ideas! Best regards, Max -- This message has been checked by Libraesva ESG and is found to be clean. Report as bad/spam<https://mx10.wiit.cloud/action/4bXtSv01ppzTgX3/report-as-bad> Blocklist sender<https://mx10.wiit.cloud/action/4bXtSv01ppzTgX3/blocklist>
Regarding your statement that external_ids is not that relevant, that is not entirely true when using the ovn-bgp-agent.
Oh, well, we have setup with standalone gateway nodes, so fips are not distributed and ovn-bgp-agent is not deployed on each compute. So for us port bindings to host are way less relevant, as ovn-bgp-agent needs only to find router binding over host bindings. On Thu, 3 Jul 2025, 12:55 Maximilian Stinsky-Damke, <Maximilian.Stinsky-Damke@wiit.cloud> wrote:
Yes, options:virtual-parents should contain both ports. The normal flow during a failover should be that a 3rd amphora vm is created, resulting in 3 virtual-parents for a short duration. After the 3rd amphora is allocated one of the amphoras will be removed so that in ovn we have 2 virtual-parents again. But it seems that neutron sometimes does not add the 3rd amphora vrrp port to the virtual-parents and when the old amphora is removed I end up with a single virtual-parent on my vip port.
Like I said, when I unset the allowed address pair on the new amphora vrrp port and then reattach the allowed address pair it will always fix the loadbalancer. Therefore it seems to me that neutron for what ever reason sometimes misses to add the virtual-parent when the amphora vm is rotated.
Regarding your statement that external_ids is not that relevant, that is not entirely true when using the ovn-bgp-agent. As the agent requires to know if it is responsible for the vip port it looks inside ovn for vip ports that have the external_ids:neutron:host_id field set to its host. And only then it announces the ip address. But because in some cases I am missing the virtual-parent for potentially the vrrp master, ovn brings down the port and neutron removes the host_id field which results in no ovn-bgp-agent feeling responsible for the vip and the lb beeing down.
------------------------------ *From:* Dmitriy Rabotyagov <noonedeadpunk@gmail.com> *Sent:* 03 July 2025 12:35 *To:* Maximilian Stinsky-Damke <Maximilian.Stinsky-Damke@wiit.cloud> *Cc:* openstack-discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [neutron][octavia] vip port missing virtual-parent in ovn
We have quite similar setup, though pre-2025.1
But from what I can recall, options:virtual-parents should contain both Octavia ports at all times. From what I recall, virtual-parents are identified through allowed address pairs mappings, as you need to add the VIP to be allowed on MAC addresses of vrrp interfaces. There is also a strict requirement of having port security enabled on such ports. So a thing to check would be that your Octavia network has port security enabled by default, and you configure security groups accordingly on amphora instances instead of disabling it. And also check that allowed address pairs are defined for amphora management ports
With that I would expect external_ids behavior and port state in Neutron to be not that relevant.
On Thu, 3 Jul 2025, 12:24 Maximilian Stinsky-Damke, <Maximilian.Stinsky-Damke@wiit.cloud> wrote:
Greetings,
I'm currently debugging a strange issue related to Octavia loadbalancer failovers and was wondering if anyone has encountered something similar or has any insights.
We're running Neutron with ML2/OVN and the ovn-bgp-agent. When we create Octavia load balancers using the Amphora driver, they come up fine. After attaching a floating IP, the ovn-bgp-agent correctly announces the FIP. However, during load balancer failovers, we sometimes observe that the FIP is no longer being announced. After some debugging, we found that the ovn-bgp-agent stops announcing the FIP because the VIP port in OVN is down and no longer has a neutron:host_id set in external_ids. Digging deeper, we noticed that the VIP port's options:virtual-parents only includes a single port ID and it's not the one corresponding to the current VRRP master. As a result, OVN doesn't associate the VIP with the active amphora, causing the port to be up=false and leaving the external_ids:neutron:host_id field empty. Because of this, no bgp-agent sees itself as responsible for announcing the FIP.
Looking at the Octavia amphora ports in Neutron, all of them have the correct allowed address pairs set. This should normally result in the correct virtual-parents being reflected in OVN.
Interestingly, I was able to reproduce this behavior even with the SINGLE topology in Octavia, a failover sometimes removes the entire options fields related to virtual-parents and virtual-ip from the VIP port. Manually unsetting and re-setting the allowed address pair on the affected port always restores the correct virtual-parents, and the announcement resumes. Now the question is how could this happen? It seems completely random, but it's very reproducible in my environment.
We're running Neutron 2025.1 (26.0.0) with OVN 24.03.1-5.el9. I haven't found any errors in the Neutron server logs indicating issues writing to OVN, or any relevant tracebacks.
Thanks in advance for any input or ideas!
Best regards, Max
-- This message has been checked by Libraesva ESG and is found to be clean. Report as bad/spam <https://mx10.wiit.cloud/action/4bXtSv01ppzTgX3/report-as-bad> Blocklist sender <https://mx10.wiit.cloud/action/4bXtSv01ppzTgX3/blocklist>
participants (2)
-
Dmitriy Rabotyagov
-
Maximilian Stinsky-Damke