On Mon, Jun 10, 2024 at 11:05 AM Florian Haas <florian@cleura.com> wrote:
On 10/06/2024 16:36, Ihar Hrachyshka wrote:
>     If I now want to assign a floating IP to the "worker", then
>     routing-wise
>     there really should be no problem: the router should take care of
>     SNAT/DNAT, it would translate the floating IP into a private IP on the
>     "inner" network, would consult its routing table, and throw packets for
>     that "inner" network to the "gateway" to handle, per the static route
>     that's been set.
>
>     But when I try to associate a floating IP to the "worker" in this
>     scenario, I *still* get the "External network not reachable from
>     subnet"
>     error, because the associated internal check[1] does not take the
>     existing static route into account.
>
>
> How would Neutron determine that the "inner" network is indeed
> accessible through the "gateway" VM? For what it's worth, from Neutron
> perspective, it's just a VM with two legs in each of the two networks.
> Forwarding inside the "gateway" VM is opaque.

It would not know *that*, but I wonder: from the router's perspective,
does that really matter?

Currently, the router checks its own interfaces_info map to determine if
it is connected to a specific subnet. It then checks whether that
subnet's netmask matches the private IP of the floating/fixed IP
association. (I am, obviously, paraphrasing and grossly simplifying, but
I think that I got the logic correctly, at least in principle.)

Neutron can't really check if that private IP address is actually
"accessible": I could shell into that server and disable the interface,
and the router would be none the wiser. It would just keep sending
packets my server's way, and they would time out.

Now, if the router, when determining whether it can host a floating IP,
were to also check its own routes map to see if

* the private IP (on the "inner" network) matched a route destination, and
* the route nexthop matched a subnet that the router was directly
connected to (namely, the "outer" network),

wouldn't that be just as sufficient a sanity check for routing purposes?

All that we expect of the router is to create DNAT/SNAT rules for the
floating IP addresses, and with said information from routes this should
be possible, with as much plausibility checking as for directly
connected subnets — or am I missing something crucially important?

(Took me a while to think it through.) I think you are right. There may be assumptions in the code that would not make it easy to relax the checks as you suggest, but as a feature request, I think it would make sense to consider. Could you please report a RFE in LP so that the Neutron team can take a closer look?
 

>     This appears to have been an issue in other contexts as well: the
>     defunct Tricircle project appears to have worked around this limitation
>     via a service plugin that just omitted that check.[2]
>
>
>     Now my questions are these:
>
>     1. Shouldn't this be fixed in core, instead? It would seem to me like
>     the default check is too simplistic, but I might be missing something.
>     2. Is there a way to make this check take static routes into account,
>     other than providing a service plugin like Tricircle used to do?
>
>
> Since your VM is already aware of its forwarding role, can you make its
> "outer" port own the FIP (and a separate fixed_ip dedicated to traffic
> from and to the worker), then DNAT it further?

Yes, I can, but doesn't that only work for one floating IP per port?

A single port may have multiple fixed IPs. You can then tie each unique FIP to a particular "worker" through the `fixed_ip_address` attribute. This should allow your gateway VM to distinguish between them, based on the destination address. Would that work?
 
That means that if I want to run multiple "workers" behind the "gateway"
machine, I need to assign the "gateway" a bunch of ports on the
"outside" network. That seems somewhat needlessly complicated, and it
also doesn't play well with separation of concerns: it would appear more
logical to me if the router was the one doing NAT and routing and only
that, and firewalling and traffic inspection happened only on the
"gateway" appliance.

(And yes, I do understand that another option would be to just not
bother with NAT nor floating IPs at all, and go just IPv6 all the way.)

Cheers,
Florian