Hi everyone!

I’m facing a weird situation on a tenant of one of our Openstack cluster based on Victoria.

On this tenant, the network topology is as follow:

One DMZ network (192.168.0.0/24) linked to our public network through a neutron router where there is a VM acting as a bastion/router for the MGMT network.

One MGMT network (172.16.31.0/24) where all VMs are linked to.

On the DMZ network, there is a linux Debian 11, let’s call it VM-A with a Floating IP from the public pool, this VM is both attached to the DMZ network (ens3 / 192.168.0.12) AND the MGMT network (ens4 / 172.16.31.23).

All other VMs, let’s call them VM-X are exclusively attached to the MGMT network (ens4).

I’ve setup VM-A with ip_forward kernel module and the following iptables rule:

# iptables -t nat -A POSTROUTING -o ens3 -J SNAT —to-source 192.168.0.12

My VM-X are on their own setup with a default gateway via VM-A:

# ip route add default via 172.31.16.23

The setup seems to be working as if I don’t put the iptables rule and the kernel forwarding I can’t see any packets on my DMZ interface (ens3) on VM-A from VM-X.

Ok so now that you get the whole schema, let dive into the issue.

So when all rules, modules and gateway are set, I can fully see my VM-X traffic (ICMP ping to a dns server) going from VM-X (ens4) to VM-A (ens4) then forwarded to VM-A (ens3) and finally going to our public IP targeted service.

What’s not working however is the response not reaching back to VM-X.

I’ve tcpdump the whole traffic from VM-X to VM-A on each point of the platfrom:

from inside the VM-X nic, on the tap device, on the qbr bridge, on the qvb veth, on the qvo second side of the veth through the ovs bridges and vice-versa.

However the response packets aren’t reaching back further than on the VM-A qvo veth.
Once it exit the VM-A the traffic never reaches the VM-X.

What’s really suspicious in here is that a direct ping from VM-X (172.16.31.54) to VM-A (172.16.31.23) is coming back correctly, so it looks like if ovs detected that the response on a SNAT case isn’t legit or something similar.

Is anyone able to get such setup working?

Here are few additional information:
Host runs on CentOS 8.5 latest update.
Our platform is a Openstack Victoria deployed using kolla-ansible.
We are using a OVS based deployment.
Our tunnels are VXLAN.
All VMs have a fully open secgroup applied and all ports have it (I checked it twice and even on host iptables).

If you ever need additional information feel free to let me know !