On Mon, Jun 10, 2024 at 8:31 AM Florian Haas <florian@cleura.com> wrote:
Hi everyone,

here's a question I've been chewing on for some time.
Please bear with me while I illustrate the background.


1. Suppose you have a regular (internal) Neutron network, subnet, and
router. You now plug the subnet into the router and set an external
gateway. You then create a port on the subnet. You associate a floating
IP with that port. All is well.

2. Suppose instead you do plug the subnet into the router, but you don't
set an external gateway on it. You try to associate a floating IP with
that port. This creates a BadRequest because: "External network <uuid>
is not reachable from subnet <uuid>".

3. Suppose instead you do set an external gateway on the router, but you
don't connect the subnet to it. Again, you try to associate a floating
IP with that port. This also creates a BadRequest because: "External
network <uuid> is not reachable from subnet <uuid>".

So far, so good. Up to this point, everything is behaving according to
intuitive expectations.


Now suppose we have

* the external network,
* an "outer" internal network,
* a Neutron router connecting the two (that is, the router has the
external network as its gateway, and has an interface to the "outer"
subnet),
* an "inner" internal network,
* a Nova server (let's call that the "gateway"), which connects the
"inner" and "outer" networks, a
* another Nova server (the "worker") that sits on the "inner" network.

(Suppose, for the sake of discussion, that the "gateway" runs some kind
of web application firewall or other virtual security appliance, and the
"worker" is a general-purpose Linux instance.)

Now I can give the Neutron router a static route so that traffic to the
"inner" network uses the "gateway" as its nexthop. Provided the
"gateway" does its forwarding correctly, this means that the "worker"
has internet connectivity via the "gateway" (which, in turn, achieves
this via the router).

This, too, works fine for outbound connections originating with the
"worker".


If I now want to assign a floating IP to the "worker", then routing-wise
there really should be no problem: the router should take care of
SNAT/DNAT, it would translate the floating IP into a private IP on the
"inner" network, would consult its routing table, and throw packets for
that "inner" network to the "gateway" to handle, per the static route
that's been set.

But when I try to associate a floating IP to the "worker" in this
scenario, I *still* get the "External network not reachable from subnet"
error, because the associated internal check[1] does not take the
existing static route into account.

How would Neutron determine that the "inner" network is indeed accessible through the "gateway" VM? For what it's worth, from Neutron perspective, it's just a VM with two legs in each of the two networks. Forwarding inside the "gateway" VM is opaque.
 

This appears to have been an issue in other contexts as well: the
defunct Tricircle project appears to have worked around this limitation
via a service plugin that just omitted that check.[2]


Now my questions are these:

1. Shouldn't this be fixed in core, instead? It would seem to me like
the default check is too simplistic, but I might be missing something.
2. Is there a way to make this check take static routes into account,
other than providing a service plugin like Tricircle used to do?


Since your VM is already aware of its forwarding role, can you make its "outer" port own the FIP (and a separate fixed_ip dedicated to traffic from and to the worker), then DNAT it further?
 

Thanks!

Cheers,
Florian


References:
[1] get_router_for_floatingip():
https://opendev.org/openstack/neutron/src/commit/ebed620aafdff75a05fb5d42add6eee571663528/neutron/db/l3_db.py#L1270

[2] get_router_for_floatingip():
https://opendev.org/openstack/tricircle/src/commit/8db8fb30f5757c46a953962d8c2133743cfffdb3/tricircle/network/local_l3_plugin.py#L30