[Neutron] Is there a way to work around "External network not reachable from subnet" by setting a static route?
Hi everyone, here's a question I've been chewing on for some time. Please bear with me while I illustrate the background. 1. Suppose you have a regular (internal) Neutron network, subnet, and router. You now plug the subnet into the router and set an external gateway. You then create a port on the subnet. You associate a floating IP with that port. All is well. 2. Suppose instead you do plug the subnet into the router, but you don't set an external gateway on it. You try to associate a floating IP with that port. This creates a BadRequest because: "External network <uuid> is not reachable from subnet <uuid>". 3. Suppose instead you do set an external gateway on the router, but you don't connect the subnet to it. Again, you try to associate a floating IP with that port. This also creates a BadRequest because: "External network <uuid> is not reachable from subnet <uuid>". So far, so good. Up to this point, everything is behaving according to intuitive expectations. Now suppose we have * the external network, * an "outer" internal network, * a Neutron router connecting the two (that is, the router has the external network as its gateway, and has an interface to the "outer" subnet), * an "inner" internal network, * a Nova server (let's call that the "gateway"), which connects the "inner" and "outer" networks, a * another Nova server (the "worker") that sits on the "inner" network. (Suppose, for the sake of discussion, that the "gateway" runs some kind of web application firewall or other virtual security appliance, and the "worker" is a general-purpose Linux instance.) Now I can give the Neutron router a static route so that traffic to the "inner" network uses the "gateway" as its nexthop. Provided the "gateway" does its forwarding correctly, this means that the "worker" has internet connectivity via the "gateway" (which, in turn, achieves this via the router). This, too, works fine for outbound connections originating with the "worker". If I now want to assign a floating IP to the "worker", then routing-wise there really should be no problem: the router should take care of SNAT/DNAT, it would translate the floating IP into a private IP on the "inner" network, would consult its routing table, and throw packets for that "inner" network to the "gateway" to handle, per the static route that's been set. But when I try to associate a floating IP to the "worker" in this scenario, I *still* get the "External network not reachable from subnet" error, because the associated internal check[1] does not take the existing static route into account. This appears to have been an issue in other contexts as well: the defunct Tricircle project appears to have worked around this limitation via a service plugin that just omitted that check.[2] Now my questions are these: 1. Shouldn't this be fixed in core, instead? It would seem to me like the default check is too simplistic, but I might be missing something. 2. Is there a way to make this check take static routes into account, other than providing a service plugin like Tricircle used to do? Thanks! Cheers, Florian References: [1] get_router_for_floatingip(): https://opendev.org/openstack/neutron/src/commit/ebed620aafdff75a05fb5d42add... [2] get_router_for_floatingip(): https://opendev.org/openstack/tricircle/src/commit/8db8fb30f5757c46a953962d8...
On Mon, Jun 10, 2024 at 8:31 AM Florian Haas <florian@cleura.com> wrote:
Hi everyone,
here's a question I've been chewing on for some time. Please bear with me while I illustrate the background.
1. Suppose you have a regular (internal) Neutron network, subnet, and router. You now plug the subnet into the router and set an external gateway. You then create a port on the subnet. You associate a floating IP with that port. All is well.
2. Suppose instead you do plug the subnet into the router, but you don't set an external gateway on it. You try to associate a floating IP with that port. This creates a BadRequest because: "External network <uuid> is not reachable from subnet <uuid>".
3. Suppose instead you do set an external gateway on the router, but you don't connect the subnet to it. Again, you try to associate a floating IP with that port. This also creates a BadRequest because: "External network <uuid> is not reachable from subnet <uuid>".
So far, so good. Up to this point, everything is behaving according to intuitive expectations.
Now suppose we have
* the external network, * an "outer" internal network, * a Neutron router connecting the two (that is, the router has the external network as its gateway, and has an interface to the "outer" subnet), * an "inner" internal network, * a Nova server (let's call that the "gateway"), which connects the "inner" and "outer" networks, a * another Nova server (the "worker") that sits on the "inner" network.
(Suppose, for the sake of discussion, that the "gateway" runs some kind of web application firewall or other virtual security appliance, and the "worker" is a general-purpose Linux instance.)
Now I can give the Neutron router a static route so that traffic to the "inner" network uses the "gateway" as its nexthop. Provided the "gateway" does its forwarding correctly, this means that the "worker" has internet connectivity via the "gateway" (which, in turn, achieves this via the router).
This, too, works fine for outbound connections originating with the "worker".
If I now want to assign a floating IP to the "worker", then routing-wise there really should be no problem: the router should take care of SNAT/DNAT, it would translate the floating IP into a private IP on the "inner" network, would consult its routing table, and throw packets for that "inner" network to the "gateway" to handle, per the static route that's been set.
But when I try to associate a floating IP to the "worker" in this scenario, I *still* get the "External network not reachable from subnet" error, because the associated internal check[1] does not take the existing static route into account.
How would Neutron determine that the "inner" network is indeed accessible through the "gateway" VM? For what it's worth, from Neutron perspective, it's just a VM with two legs in each of the two networks. Forwarding inside the "gateway" VM is opaque.
This appears to have been an issue in other contexts as well: the defunct Tricircle project appears to have worked around this limitation via a service plugin that just omitted that check.[2]
Now my questions are these:
1. Shouldn't this be fixed in core, instead? It would seem to me like the default check is too simplistic, but I might be missing something. 2. Is there a way to make this check take static routes into account, other than providing a service plugin like Tricircle used to do?
Since your VM is already aware of its forwarding role, can you make its "outer" port own the FIP (and a separate fixed_ip dedicated to traffic from and to the worker), then DNAT it further?
Thanks!
Cheers, Florian
References: [1] get_router_for_floatingip():
https://opendev.org/openstack/neutron/src/commit/ebed620aafdff75a05fb5d42add...
[2] get_router_for_floatingip():
https://opendev.org/openstack/tricircle/src/commit/8db8fb30f5757c46a953962d8...
On 10/06/2024 16:36, Ihar Hrachyshka wrote:
If I now want to assign a floating IP to the "worker", then routing-wise there really should be no problem: the router should take care of SNAT/DNAT, it would translate the floating IP into a private IP on the "inner" network, would consult its routing table, and throw packets for that "inner" network to the "gateway" to handle, per the static route that's been set.
But when I try to associate a floating IP to the "worker" in this scenario, I *still* get the "External network not reachable from subnet" error, because the associated internal check[1] does not take the existing static route into account.
How would Neutron determine that the "inner" network is indeed accessible through the "gateway" VM? For what it's worth, from Neutron perspective, it's just a VM with two legs in each of the two networks. Forwarding inside the "gateway" VM is opaque.
It would not know *that*, but I wonder: from the router's perspective, does that really matter? Currently, the router checks its own interfaces_info map to determine if it is connected to a specific subnet. It then checks whether that subnet's netmask matches the private IP of the floating/fixed IP association. (I am, obviously, paraphrasing and grossly simplifying, but I think that I got the logic correctly, at least in principle.) Neutron can't really check if that private IP address is actually "accessible": I could shell into that server and disable the interface, and the router would be none the wiser. It would just keep sending packets my server's way, and they would time out. Now, if the router, when determining whether it can host a floating IP, were to also check its own routes map to see if * the private IP (on the "inner" network) matched a route destination, and * the route nexthop matched a subnet that the router was directly connected to (namely, the "outer" network), wouldn't that be just as sufficient a sanity check for routing purposes? All that we expect of the router is to create DNAT/SNAT rules for the floating IP addresses, and with said information from routes this should be possible, with as much plausibility checking as for directly connected subnets — or am I missing something crucially important?
This appears to have been an issue in other contexts as well: the defunct Tricircle project appears to have worked around this limitation via a service plugin that just omitted that check.[2]
Now my questions are these:
1. Shouldn't this be fixed in core, instead? It would seem to me like the default check is too simplistic, but I might be missing something. 2. Is there a way to make this check take static routes into account, other than providing a service plugin like Tricircle used to do?
Since your VM is already aware of its forwarding role, can you make its "outer" port own the FIP (and a separate fixed_ip dedicated to traffic from and to the worker), then DNAT it further?
Yes, I can, but doesn't that only work for one floating IP per port? That means that if I want to run multiple "workers" behind the "gateway" machine, I need to assign the "gateway" a bunch of ports on the "outside" network. That seems somewhat needlessly complicated, and it also doesn't play well with separation of concerns: it would appear more logical to me if the router was the one doing NAT and routing and only that, and firewalling and traffic inspection happened only on the "gateway" appliance. (And yes, I do understand that another option would be to just not bother with NAT nor floating IPs at all, and go just IPv6 all the way.) Cheers, Florian
On Mon, Jun 10, 2024 at 11:05 AM Florian Haas <florian@cleura.com> wrote:
On 10/06/2024 16:36, Ihar Hrachyshka wrote:
If I now want to assign a floating IP to the "worker", then routing-wise there really should be no problem: the router should take care of SNAT/DNAT, it would translate the floating IP into a private IP on
the
"inner" network, would consult its routing table, and throw packets
for
that "inner" network to the "gateway" to handle, per the static route that's been set.
But when I try to associate a floating IP to the "worker" in this scenario, I *still* get the "External network not reachable from subnet" error, because the associated internal check[1] does not take the existing static route into account.
How would Neutron determine that the "inner" network is indeed accessible through the "gateway" VM? For what it's worth, from Neutron perspective, it's just a VM with two legs in each of the two networks. Forwarding inside the "gateway" VM is opaque.
It would not know *that*, but I wonder: from the router's perspective, does that really matter?
Currently, the router checks its own interfaces_info map to determine if it is connected to a specific subnet. It then checks whether that subnet's netmask matches the private IP of the floating/fixed IP association. (I am, obviously, paraphrasing and grossly simplifying, but I think that I got the logic correctly, at least in principle.)
Neutron can't really check if that private IP address is actually "accessible": I could shell into that server and disable the interface, and the router would be none the wiser. It would just keep sending packets my server's way, and they would time out.
Now, if the router, when determining whether it can host a floating IP, were to also check its own routes map to see if
* the private IP (on the "inner" network) matched a route destination, and * the route nexthop matched a subnet that the router was directly connected to (namely, the "outer" network),
wouldn't that be just as sufficient a sanity check for routing purposes?
All that we expect of the router is to create DNAT/SNAT rules for the floating IP addresses, and with said information from routes this should be possible, with as much plausibility checking as for directly connected subnets — or am I missing something crucially important?
(Took me a while to think it through.) I think you are right. There may be assumptions in the code that would not make it easy to relax the checks as you suggest, but as a feature request, I think it would make sense to consider. Could you please report a RFE in LP so that the Neutron team can take a closer look?
This appears to have been an issue in other contexts as well: the defunct Tricircle project appears to have worked around this
limitation
via a service plugin that just omitted that check.[2]
Now my questions are these:
1. Shouldn't this be fixed in core, instead? It would seem to me like the default check is too simplistic, but I might be missing
something.
2. Is there a way to make this check take static routes into account, other than providing a service plugin like Tricircle used to do?
Since your VM is already aware of its forwarding role, can you make its "outer" port own the FIP (and a separate fixed_ip dedicated to traffic from and to the worker), then DNAT it further?
Yes, I can, but doesn't that only work for one floating IP per port?
A single port may have multiple fixed IPs. You can then tie each unique FIP to a particular "worker" through the `fixed_ip_address` attribute. This should allow your gateway VM to distinguish between them, based on the destination address. Would that work?
That means that if I want to run multiple "workers" behind the "gateway" machine, I need to assign the "gateway" a bunch of ports on the "outside" network. That seems somewhat needlessly complicated, and it also doesn't play well with separation of concerns: it would appear more logical to me if the router was the one doing NAT and routing and only that, and firewalling and traffic inspection happened only on the "gateway" appliance.
(And yes, I do understand that another option would be to just not bother with NAT nor floating IPs at all, and go just IPv6 all the way.)
Cheers, Florian
On 05/07/2024 20:00, Ihar Hrachyshka wrote:
On Mon, Jun 10, 2024 at 11:05 AM Florian Haas <florian@cleura.com <mailto:florian@cleura.com>> wrote:
On 10/06/2024 16:36, Ihar Hrachyshka wrote: > If I now want to assign a floating IP to the "worker", then > routing-wise > there really should be no problem: the router should take care of > SNAT/DNAT, it would translate the floating IP into a private IP on the > "inner" network, would consult its routing table, and throw packets for > that "inner" network to the "gateway" to handle, per the static route > that's been set. > > But when I try to associate a floating IP to the "worker" in this > scenario, I *still* get the "External network not reachable from > subnet" > error, because the associated internal check[1] does not take the > existing static route into account. > > > How would Neutron determine that the "inner" network is indeed > accessible through the "gateway" VM? For what it's worth, from Neutron > perspective, it's just a VM with two legs in each of the two networks. > Forwarding inside the "gateway" VM is opaque.
It would not know *that*, but I wonder: from the router's perspective, does that really matter?
Currently, the router checks its own interfaces_info map to determine if it is connected to a specific subnet. It then checks whether that subnet's netmask matches the private IP of the floating/fixed IP association. (I am, obviously, paraphrasing and grossly simplifying, but I think that I got the logic correctly, at least in principle.)
Neutron can't really check if that private IP address is actually "accessible": I could shell into that server and disable the interface, and the router would be none the wiser. It would just keep sending packets my server's way, and they would time out.
Now, if the router, when determining whether it can host a floating IP, were to also check its own routes map to see if
* the private IP (on the "inner" network) matched a route destination, and * the route nexthop matched a subnet that the router was directly connected to (namely, the "outer" network),
wouldn't that be just as sufficient a sanity check for routing purposes?
All that we expect of the router is to create DNAT/SNAT rules for the floating IP addresses, and with said information from routes this should be possible, with as much plausibility checking as for directly connected subnets — or am I missing something crucially important?
(Took me a while to think it through.) I think you are right. There may be assumptions in the code that would not make it easy to relax the checks as you suggest, but as a feature request, I think it would make sense to consider. Could you please report a RFE in LP so that the Neutron team can take a closer look?
Certainly: https://bugs.launchpad.net/neutron/+bug/2072505 And thanks for thinking this through! :)
> This appears to have been an issue in other contexts as well: the > defunct Tricircle project appears to have worked around this limitation > via a service plugin that just omitted that check.[2] > > > Now my questions are these: > > 1. Shouldn't this be fixed in core, instead? It would seem to me like > the default check is too simplistic, but I might be missing something. > 2. Is there a way to make this check take static routes into account, > other than providing a service plugin like Tricircle used to do? > > > Since your VM is already aware of its forwarding role, can you make its > "outer" port own the FIP (and a separate fixed_ip dedicated to traffic > from and to the worker), then DNAT it further?
Yes, I can, but doesn't that only work for one floating IP per port?
A single port may have multiple fixed IPs. You can then tie each unique FIP to a particular "worker" through the `fixed_ip_address` attribute. This should allow your gateway VM to distinguish between them, based on the destination address. Would that work?
It might, but not universally so. As I understand it, when multiple fixed IP addresses are associated with a port, and the "gateway" VM is configured as a DHCP client, *which* IP address the DHCP server hands out for that port is essentially random, and the rest of the IP addresses are then for cloud-init to configure. So if I'm not mistaken, this may create a situation where (a) the allocation of the primary IP address to that interface is non-deterministic (which is undesirable), and (b) the particular cloud-init implementation in use in the VM may or may not support the additional network configuration. Cheers, Florian
participants (2)
-
Florian Haas
-
Ihar Hrachyshka