[neutron] East-West networking issue on DVR after failed attempt at starting a new instance
Hi, We use Openstack Wallaby installed through Kolla-ansible on this setup. Here's a quick rundown of the issue we just noticed: -We try popping an instance which fails because of a storage issue. -Nova tries to create the instance on 3 different nodes before failing. -We notice that instances on these 3 nodes and only those instances cannot connect to each other anymore. -Doing Tcpdump tests, we realize that pings are received by each instance, but never replied to. -Restarting the neutron-openvswitch-agent container fixes this issue. I suspect l2population might have something to do with this. Is the ARP table rebuilt when the openvswitch-agent is restarted? -- Jean-Philippe Méthot Senior Openstack system administrator Administrateur système Openstack sénior PlanetHoster inc.
Hi, On czwartek, 7 października 2021 23:37:37 CEST J-P Methot wrote:
Hi,
We use Openstack Wallaby installed through Kolla-ansible on this setup. Here's a quick rundown of the issue we just noticed:
-We try popping an instance which fails because of a storage issue.
-Nova tries to create the instance on 3 different nodes before failing.
-We notice that instances on these 3 nodes and only those instances cannot connect to each other anymore.
-Doing Tcpdump tests, we realize that pings are received by each instance, but never replied to.
-Restarting the neutron-openvswitch-agent container fixes this issue.
I suspect l2population might have something to do with this. Is the ARP table rebuilt when the openvswitch-agent is restarted?
If You are using dvr and l2population, You have arp_reponder enabled so arp replies for tunnel networks are done locally in the ovs bridges. When You restart neutron-openvswitch-agent, it will regenerate all OF rules so yes, if some rules were missing, restart should add them again. -- Slawek Kaplonski Principal Software Engineer Red Hat
participants (2)
-
J-P Methot
-
Slawek Kaplonski