On 05/07/2024 20:00, Ihar Hrachyshka wrote:
On Mon, Jun 10, 2024 at 11:05 AM Florian Haas <florian@cleura.com <mailto:florian@cleura.com>> wrote:
On 10/06/2024 16:36, Ihar Hrachyshka wrote: > If I now want to assign a floating IP to the "worker", then > routing-wise > there really should be no problem: the router should take care of > SNAT/DNAT, it would translate the floating IP into a private IP on the > "inner" network, would consult its routing table, and throw packets for > that "inner" network to the "gateway" to handle, per the static route > that's been set. > > But when I try to associate a floating IP to the "worker" in this > scenario, I *still* get the "External network not reachable from > subnet" > error, because the associated internal check[1] does not take the > existing static route into account. > > > How would Neutron determine that the "inner" network is indeed > accessible through the "gateway" VM? For what it's worth, from Neutron > perspective, it's just a VM with two legs in each of the two networks. > Forwarding inside the "gateway" VM is opaque.
It would not know *that*, but I wonder: from the router's perspective, does that really matter?
Currently, the router checks its own interfaces_info map to determine if it is connected to a specific subnet. It then checks whether that subnet's netmask matches the private IP of the floating/fixed IP association. (I am, obviously, paraphrasing and grossly simplifying, but I think that I got the logic correctly, at least in principle.)
Neutron can't really check if that private IP address is actually "accessible": I could shell into that server and disable the interface, and the router would be none the wiser. It would just keep sending packets my server's way, and they would time out.
Now, if the router, when determining whether it can host a floating IP, were to also check its own routes map to see if
* the private IP (on the "inner" network) matched a route destination, and * the route nexthop matched a subnet that the router was directly connected to (namely, the "outer" network),
wouldn't that be just as sufficient a sanity check for routing purposes?
All that we expect of the router is to create DNAT/SNAT rules for the floating IP addresses, and with said information from routes this should be possible, with as much plausibility checking as for directly connected subnets — or am I missing something crucially important?
(Took me a while to think it through.) I think you are right. There may be assumptions in the code that would not make it easy to relax the checks as you suggest, but as a feature request, I think it would make sense to consider. Could you please report a RFE in LP so that the Neutron team can take a closer look?
Certainly: https://bugs.launchpad.net/neutron/+bug/2072505 And thanks for thinking this through! :)
> This appears to have been an issue in other contexts as well: the > defunct Tricircle project appears to have worked around this limitation > via a service plugin that just omitted that check.[2] > > > Now my questions are these: > > 1. Shouldn't this be fixed in core, instead? It would seem to me like > the default check is too simplistic, but I might be missing something. > 2. Is there a way to make this check take static routes into account, > other than providing a service plugin like Tricircle used to do? > > > Since your VM is already aware of its forwarding role, can you make its > "outer" port own the FIP (and a separate fixed_ip dedicated to traffic > from and to the worker), then DNAT it further?
Yes, I can, but doesn't that only work for one floating IP per port?
A single port may have multiple fixed IPs. You can then tie each unique FIP to a particular "worker" through the `fixed_ip_address` attribute. This should allow your gateway VM to distinguish between them, based on the destination address. Would that work?
It might, but not universally so. As I understand it, when multiple fixed IP addresses are associated with a port, and the "gateway" VM is configured as a DHCP client, *which* IP address the DHCP server hands out for that port is essentially random, and the rest of the IP addresses are then for cloud-init to configure. So if I'm not mistaken, this may create a situation where (a) the allocation of the primary IP address to that interface is non-deterministic (which is undesirable), and (b) the particular cloud-init implementation in use in the VM may or may not support the additional network configuration. Cheers, Florian