On Mon, Jun 10, 2024 at 8:31 AM Florian Haas <florian@cleura.com> wrote:
Hi everyone,
here's a question I've been chewing on for some time. Please bear with me while I illustrate the background.
1. Suppose you have a regular (internal) Neutron network, subnet, and router. You now plug the subnet into the router and set an external gateway. You then create a port on the subnet. You associate a floating IP with that port. All is well.
2. Suppose instead you do plug the subnet into the router, but you don't set an external gateway on it. You try to associate a floating IP with that port. This creates a BadRequest because: "External network <uuid> is not reachable from subnet <uuid>".
3. Suppose instead you do set an external gateway on the router, but you don't connect the subnet to it. Again, you try to associate a floating IP with that port. This also creates a BadRequest because: "External network <uuid> is not reachable from subnet <uuid>".
So far, so good. Up to this point, everything is behaving according to intuitive expectations.
Now suppose we have
* the external network, * an "outer" internal network, * a Neutron router connecting the two (that is, the router has the external network as its gateway, and has an interface to the "outer" subnet), * an "inner" internal network, * a Nova server (let's call that the "gateway"), which connects the "inner" and "outer" networks, a * another Nova server (the "worker") that sits on the "inner" network.
(Suppose, for the sake of discussion, that the "gateway" runs some kind of web application firewall or other virtual security appliance, and the "worker" is a general-purpose Linux instance.)
Now I can give the Neutron router a static route so that traffic to the "inner" network uses the "gateway" as its nexthop. Provided the "gateway" does its forwarding correctly, this means that the "worker" has internet connectivity via the "gateway" (which, in turn, achieves this via the router).
This, too, works fine for outbound connections originating with the "worker".
If I now want to assign a floating IP to the "worker", then routing-wise there really should be no problem: the router should take care of SNAT/DNAT, it would translate the floating IP into a private IP on the "inner" network, would consult its routing table, and throw packets for that "inner" network to the "gateway" to handle, per the static route that's been set.
But when I try to associate a floating IP to the "worker" in this scenario, I *still* get the "External network not reachable from subnet" error, because the associated internal check[1] does not take the existing static route into account.
How would Neutron determine that the "inner" network is indeed accessible through the "gateway" VM? For what it's worth, from Neutron perspective, it's just a VM with two legs in each of the two networks. Forwarding inside the "gateway" VM is opaque.
This appears to have been an issue in other contexts as well: the defunct Tricircle project appears to have worked around this limitation via a service plugin that just omitted that check.[2]
Now my questions are these:
1. Shouldn't this be fixed in core, instead? It would seem to me like the default check is too simplistic, but I might be missing something. 2. Is there a way to make this check take static routes into account, other than providing a service plugin like Tricircle used to do?
Since your VM is already aware of its forwarding role, can you make its "outer" port own the FIP (and a separate fixed_ip dedicated to traffic from and to the worker), then DNAT it further?
Thanks!
Cheers, Florian
References: [1] get_router_for_floatingip():
https://opendev.org/openstack/neutron/src/commit/ebed620aafdff75a05fb5d42add...
[2] get_router_for_floatingip():
https://opendev.org/openstack/tricircle/src/commit/8db8fb30f5757c46a953962d8...