Floating IP randomly becomes unreachable from external network (OSA AIO 30.0.2, Ubuntu 24.04)
Hello, I’m currently running an OpenStack-Ansible All-In-One deployment (version 30.0.2) on Ubuntu 24.04. The environment hosts several virtual machines across multiple tenant networks. Each VM is assigned a floating IP from a public 172.x.x.x range, which allows access from the controller host and external clients. The issue From time to time, a VM becomes unreachable via its floating IP. The instance is still: Running normally. Reachable from other VMs on the same internal tenant network using its private IP (10.x.x.x). Associated with the same security groups and port configuration as before. This happens without any apparent trigger or configuration change. I can still ping and SSH to other VMs’ floating IPs, so the problem is isolated to one instance at a time. Workaround The only way I’ve found to restore external access is: Disassociate and delete the current floating IP. Allocate a new floating IP from the same pool. Associate it to the same VM port. Once the new floating IP is set, the VM becomes immediately reachable again. This has now occurred several times across different VMs. I’m concerned it will keep happening randomly. Question Has anyone encountered similar behavior with floating IPs in an OpenStack-Ansible AIO setup? Is there a known root cause, or any specific logs or areas I should investigate? If this mailing list is not the right place for this type of question, I’d also appreciate guidance on where to report it or who to contact. Thanks in advance for your help, Christophe.
Hi Christophe, This is the good mailing list ,welcome :-) Could you please share some more details of your deployment: Is it an OVN based deployment, or you have services like neutron-l3-agent and neutron-l2-agent? Best wishes Lajos Katona (lajoskatona) <christophel@silival.com> ezt írta (időpont: 2025. jún. 22., V, 8:45):
Hello,
I’m currently running an OpenStack-Ansible All-In-One deployment (version 30.0.2) on Ubuntu 24.04.
The environment hosts several virtual machines across multiple tenant networks. Each VM is assigned a floating IP from a public 172.x.x.x range, which allows access from the controller host and external clients.
The issue
From time to time, a VM becomes unreachable via its floating IP. The instance is still:
Running normally. Reachable from other VMs on the same internal tenant network using its private IP (10.x.x.x). Associated with the same security groups and port configuration as before. This happens without any apparent trigger or configuration change. I can still ping and SSH to other VMs’ floating IPs, so the problem is isolated to one instance at a time.
Workaround The only way I’ve found to restore external access is:
Disassociate and delete the current floating IP. Allocate a new floating IP from the same pool. Associate it to the same VM port. Once the new floating IP is set, the VM becomes immediately reachable again.
This has now occurred several times across different VMs. I’m concerned it will keep happening randomly.
Question Has anyone encountered similar behavior with floating IPs in an OpenStack-Ansible AIO setup? Is there a known root cause, or any specific logs or areas I should investigate?
If this mailing list is not the right place for this type of question, I’d also appreciate guidance on where to report it or who to contact.
Thanks in advance for your help, Christophe.
One thing that strikes me and was an issue before in AIO setups, when network ranges provisioned by AIO were intersecting with VM subnets for VM itself. As then you might get an IP conflict quite easily. But other then that, I don't have anything specific in mind. I am using AIO setup as small sandboxes/showcase envs from time to time and haven't experienced such issues on exact same setup (Ubuntu 24.04, dalmatian). Also feel free to join our #openstack-ansible IRC channel on OFTC network - folks are usually around there during UTC working hours (I personally is away today though), and gladly assist you with further debug in slightly more synchronous way :) On Mon, 23 Jun 2025, 10:52 Lajos Katona, <katonalala@gmail.com> wrote:
Hi Christophe, This is the good mailing list ,welcome :-) Could you please share some more details of your deployment: Is it an OVN based deployment, or you have services like neutron-l3-agent and neutron-l2-agent?
Best wishes Lajos Katona (lajoskatona)
<christophel@silival.com> ezt írta (időpont: 2025. jún. 22., V, 8:45):
Hello,
I’m currently running an OpenStack-Ansible All-In-One deployment (version 30.0.2) on Ubuntu 24.04.
The environment hosts several virtual machines across multiple tenant networks. Each VM is assigned a floating IP from a public 172.x.x.x range, which allows access from the controller host and external clients.
The issue
From time to time, a VM becomes unreachable via its floating IP. The instance is still:
Running normally. Reachable from other VMs on the same internal tenant network using its private IP (10.x.x.x). Associated with the same security groups and port configuration as before. This happens without any apparent trigger or configuration change. I can still ping and SSH to other VMs’ floating IPs, so the problem is isolated to one instance at a time.
Workaround The only way I’ve found to restore external access is:
Disassociate and delete the current floating IP. Allocate a new floating IP from the same pool. Associate it to the same VM port. Once the new floating IP is set, the VM becomes immediately reachable again.
This has now occurred several times across different VMs. I’m concerned it will keep happening randomly.
Question Has anyone encountered similar behavior with floating IPs in an OpenStack-Ansible AIO setup? Is there a known root cause, or any specific logs or areas I should investigate?
If this mailing list is not the right place for this type of question, I’d also appreciate guidance on where to report it or who to contact.
Thanks in advance for your help, Christophe.
Hi, Thanks a lot for your feedback and for the suggestion. Regarding the network overlap: I double-checked, and the IP ranges used by the host (controller) and the AIO provisioning network dont intersect with any tenant networks or floating IP pools. Tenant subnets are all in 10.x.x.x, and floating IPs are in a dedicated 172.29.x.x block not used elsewhere. Appreciate your help again. Best regards,
Sorry, all your replies ended up in my spam folder :( I am actually very confused about you confirming this is a Linux Bridge setup on 30.0.2. As while there are linux bridges for container communication, in fact neutron should be already using OVN as a driver. If that is AIO, it should be defining`neutron_plugin_type` variable explicitly. Can you do either grep -r neutron_plugin_type /etc/openstack_deploy/ or ansible -m debug -a var=neutron_plugin_type neutron_all to confirm the driver which is used? пн, 23 июн. 2025 г. в 12:23, <christophel@silival.com>:
Hi,
Thanks a lot for your feedback and for the suggestion.
Regarding the network overlap: I double-checked, and the IP ranges used by the host (controller) and the AIO provisioning network dont intersect with any tenant networks or floating IP pools. Tenant subnets are all in 10.x.x.x, and floating IPs are in a dedicated 172.29.x.x block not used elsewhere.
Appreciate your help again.
Best regards,
Hi, Thank you for the warm welcome and your reply. This is not an OVN-based deployment. It's a standard OpenStack-Ansible All-In-One (AIO) installation, using neutron-linuxbridge with the traditional neutron-l3-agent and neutron-l2-agent services. No OVN components are installed. Everything else (including metadata, DHCP, and L3 agents) seems to behave normally. The issue appears limited to floating IP reachability, and affects VMs sporadically. If you have suggestions for what I could check—specific logs, agent status, or perhaps a known bug or race condition—I’d be happy to dig further. Thanks again for your time and support. Best regards,
Hi, Just a side note for Linuxbridge: linuxbridge driver was deprecated (see [1]) in Antelope (2023.1) cycle and even the code was removed in Epoxy (2025.1) cycle (see [2]) For migrating away from linuxbridge driver there was/is a discussion with etherpad also: https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.... [1]: https://docs.openstack.org/releasenotes/neutron/2023.1.html#relnotes-21-0-0-... [2]: https://docs.openstack.org/releasenotes/neutron/2025.1.html#upgrade-notes Lajos Katona (lajoskatona) <christophel@silival.com> ezt írta (időpont: 2025. jún. 23., H, 12:21):
Hi,
Thank you for the warm welcome and your reply.
This is not an OVN-based deployment. It's a standard OpenStack-Ansible All-In-One (AIO) installation, using neutron-linuxbridge with the traditional neutron-l3-agent and neutron-l2-agent services. No OVN components are installed.
Everything else (including metadata, DHCP, and L3 agents) seems to behave normally. The issue appears limited to floating IP reachability, and affects VMs sporadically.
If you have suggestions for what I could check—specific logs, agent status, or perhaps a known bug or race condition—I’d be happy to dig further.
Thanks again for your time and support.
Best regards,
OSA defaults to OVN since 2023.1, so I would be expecting this to be OVN-based deployment, except some extra effort was made to do smth else. ср, 25 июн. 2025 г. в 09:56, Lajos Katona <katonalala@gmail.com>:
Hi, Just a side note for Linuxbridge: linuxbridge driver was deprecated (see [1]) in Antelope (2023.1) cycle and even the code was removed in Epoxy (2025.1) cycle (see [2]) For migrating away from linuxbridge driver there was/is a discussion with etherpad also:
https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack....
[1]: https://docs.openstack.org/releasenotes/neutron/2023.1.html#relnotes-21-0-0-... [2]: https://docs.openstack.org/releasenotes/neutron/2025.1.html#upgrade-notes
Lajos Katona (lajoskatona)
<christophel@silival.com> ezt írta (időpont: 2025. jún. 23., H, 12:21):
Hi,
Thank you for the warm welcome and your reply.
This is not an OVN-based deployment. It's a standard OpenStack-Ansible All-In-One (AIO) installation, using neutron-linuxbridge with the traditional neutron-l3-agent and neutron-l2-agent services. No OVN components are installed.
Everything else (including metadata, DHCP, and L3 agents) seems to behave normally. The issue appears limited to floating IP reachability, and affects VMs sporadically.
If you have suggestions for what I could check—specific logs, agent status, or perhaps a known bug or race condition—I’d be happy to dig further.
Thanks again for your time and support.
Best regards,
participants (3)
-
christophel@silival.com
-
Dmitriy Rabotyagov
-
Lajos Katona