[ops][octavia][neutron] Distributed Virtual Routing: Floating IPs attached to virtual IP addresses are assigned on network nodes
Hello! When we initially deployed openstack we thought that using distributed virtual routing with ml2/ovs-dvr would give us the ability to automatically scale our network capacity with the number of hypervisors we use. Our main workload are kubernetes clusters which receive ingress traffic via octavia loadbancers (configured to use the amphora driver). So the idea was that we could increase the number of loadbalancers to spread the traffic over more and more compute nodes. This would imply that any volume based (distributed) denial of service attack on a single loadbalancer would just saturate a single compute node and leave the rest of the system functional. We have recently learned that, no matter the loadbalancer topology, a virtual IP is created for it by octavia. This, and probably all virtual IPs in openstack, are reserved by an unbound and disabled port and then set as an allowed address pair on any server's port which might hold it. Up to this point our initial assumption should be true, as the server actually holding the virtual IP would reply to any ARP requests and thus any traffic should be routed to the node with the virtual machine of the octavia amphora. However, we are using our main provider network as a floating IP pool and do not allow direct port creation. When a floating IP is attached to the virtual IP it is assigned to the SNAT router namespace on a network node. Naturally in high traffic or even (distributed) denial of service situations the network node might become a bottleneck. A situation we thought we could avoid by using distributed virtual routing in the first place. This leads me to a rabbit hole of questions I hope someone might be able to help with: Is the assessment above correct or am I missing something? If it is correct, do we have any other options than vertically scaling our network nodes to handle traffic? Do other ml2 drivers (e.g. OVN) handle this scenario differently? If our network nodes need to handle most of the traffic anyway, do we still have any advantage using distributed virtual routing? Especially when considering the increased complexity compared to a non- distributed setup? Has anyone ever explored non-virtual IP based high availability options like e.g. BGP multipathing in a distributed virtual routing scenario? Any input is highly appreciated. Regards, Jan
Hi Jan, If I understand correctly, the issue you are facing is with ovs-dvr, the floating IPs are implemented in the SNAT namespace on the network node, causing a congestion point for high traffic. You are looking for a way to implement floating IPs that are distributed across your deployment and not concentrated in the network nodes. Is that correct? If so, I think what you are looking for is distributed floating IPs with OVN[1]. I will let the OVN experts confirm this. Michael [1] https://docs.openstack.org/networking-ovn/latest/admin/refarch/refarch.html#... On Fri, May 6, 2022 at 6:19 AM Jan Horstmann <J.Horstmann@mittwald.de> wrote:
Hello!
When we initially deployed openstack we thought that using distributed virtual routing with ml2/ovs-dvr would give us the ability to automatically scale our network capacity with the number of hypervisors we use. Our main workload are kubernetes clusters which receive ingress traffic via octavia loadbancers (configured to use the amphora driver). So the idea was that we could increase the number of loadbalancers to spread the traffic over more and more compute nodes. This would imply that any volume based (distributed) denial of service attack on a single loadbalancer would just saturate a single compute node and leave the rest of the system functional.
We have recently learned that, no matter the loadbalancer topology, a virtual IP is created for it by octavia. This, and probably all virtual IPs in openstack, are reserved by an unbound and disabled port and then set as an allowed address pair on any server's port which might hold it. Up to this point our initial assumption should be true, as the server actually holding the virtual IP would reply to any ARP requests and thus any traffic should be routed to the node with the virtual machine of the octavia amphora. However, we are using our main provider network as a floating IP pool and do not allow direct port creation. When a floating IP is attached to the virtual IP it is assigned to the SNAT router namespace on a network node. Naturally in high traffic or even (distributed) denial of service situations the network node might become a bottleneck. A situation we thought we could avoid by using distributed virtual routing in the first place.
This leads me to a rabbit hole of questions I hope someone might be able to help with:
Is the assessment above correct or am I missing something?
If it is correct, do we have any other options than vertically scaling our network nodes to handle traffic? Do other ml2 drivers (e.g. OVN) handle this scenario differently?
If our network nodes need to handle most of the traffic anyway, do we still have any advantage using distributed virtual routing? Especially when considering the increased complexity compared to a non- distributed setup?
Has anyone ever explored non-virtual IP based high availability options like e.g. BGP multipathing in a distributed virtual routing scenario?
Any input is highly appreciated. Regards, Jan
Distributed floating IPs with OVN will bypass the bottleneck imposed by centralized NAT, but by itself that won’t allow you to scale beyond a single Amphora instance for any given floating IP. I have been working with a team to develop BGP exporting of floating IP addresses with OVN using FRR running in a container. Our current implementation exports all floating IPs and provider VLAN IPs into BGP from each compute node in a DVR setup, which allows migration of floating IPs between compute nodes in a routed environment even if they do not share any layer 2 networks. This will allow you to route traffic to multiple VMs (which can be Amphora load-balancers) using the same floating IP with IP Anycast, and the network will use route traffic to the nearest instance or load-balance with ECMP if there are multiple instances with the same number of hops in the path. This should work with allowed-address-pairs. I will be presenting this solution at the OpenInfra Summit in Berlin along with Luis Tomás, and you can try out the ovn-bgp-agent with the code here: https://github.com/luis5tb/bgp-agent And he documents the design and testing environment in his blog: https://ltomasbo.wordpress.com/2021/02/04/openstack-networking-with-bgp/ Currently ingress and egress traffic is routed via the kernel, so this setup doesn’t yet work with DPDK, however you can scale using multiple load balancers to compensate for that limitation. I am hopeful we will overcome that limitation before too long. The BGPd can also receive routes from BGP peers, but there is no need to receive all routes, one or more default routes would be sufficient to use ECMP and BFD for northbound traffic. On the kubernetes side, I worked with the OpenShift engineers to add BGP support to MetalLB using FRR, so native load balancers in kubernetes can now export endpoint IPs into BGP routing fabric as well using a similar approach. I know there has been a massive amount of interest in this approach in the last year, so I expect this to become a very popular architecture in the near future. -Dan Sneddon On Fri, May 6, 2022 at 2:45 PM Michael Johnson <johnsomor@gmail.com> wrote:
Hi Jan,
If I understand correctly, the issue you are facing is with ovs-dvr, the floating IPs are implemented in the SNAT namespace on the network node, causing a congestion point for high traffic. You are looking for a way to implement floating IPs that are distributed across your deployment and not concentrated in the network nodes.
Is that correct?
If so, I think what you are looking for is distributed floating IPs with OVN[1]. I will let the OVN experts confirm this.
Michael
[1] https://docs.openstack.org/networking-ovn/latest/admin/refarch/refarch.html#...
On Fri, May 6, 2022 at 6:19 AM Jan Horstmann <J.Horstmann@mittwald.de> wrote:
Hello!
When we initially deployed openstack we thought that using distributed virtual routing with ml2/ovs-dvr would give us the ability to automatically scale our network capacity with the number of hypervisors we use. Our main workload are kubernetes clusters which receive ingress traffic via octavia loadbancers (configured to use the amphora driver). So the idea was that we could increase the number of loadbalancers to spread the traffic over more and more compute nodes. This would imply that any volume based (distributed) denial of service attack on a single loadbalancer would just saturate a single compute node and leave the rest of the system functional.
We have recently learned that, no matter the loadbalancer topology, a virtual IP is created for it by octavia. This, and probably all virtual IPs in openstack, are reserved by an unbound and disabled port and then set as an allowed address pair on any server's port which might hold it. Up to this point our initial assumption should be true, as the server actually holding the virtual IP would reply to any ARP requests and thus any traffic should be routed to the node with the virtual machine of the octavia amphora. However, we are using our main provider network as a floating IP pool and do not allow direct port creation. When a floating IP is attached to the virtual IP it is assigned to the SNAT router namespace on a network node. Naturally in high traffic or even (distributed) denial of service situations the network node might become a bottleneck. A situation we thought we could avoid by using distributed virtual routing in the first place.
This leads me to a rabbit hole of questions I hope someone might be able to help with:
Is the assessment above correct or am I missing something?
If it is correct, do we have any other options than vertically scaling our network nodes to handle traffic? Do other ml2 drivers (e.g. OVN) handle this scenario differently?
If our network nodes need to handle most of the traffic anyway, do we still have any advantage using distributed virtual routing? Especially when considering the increased complexity compared to a non- distributed setup?
Has anyone ever explored non-virtual IP based high availability options like e.g. BGP multipathing in a distributed virtual routing scenario?
Any input is highly appreciated. Regards, Jan
-- Dan Sneddon | Senior Principal Software Engineer dsneddon@redhat.com | redhat.com/cloud dsneddon:irc | @dxs:twitter
Thanks Michael and Dan, for confirming that this would not be an issue with ovn. On Fri, 2022-05-06 at 18:09 -0700, Dan Sneddon wrote:
Distributed floating IPs with OVN will bypass the bottleneck imposed by centralized NAT, but by itself that won’t allow you to scale beyond a single Amphora instance for any given floating IP.
For our use case we can probably shard our traffic over multiple loadbalancers. So we can either move to a centralized setup and invest in some capable network nodes or move to ovn and keep a distributed setup. With ovn we could probably also switch to ovn-octavia-provider. I still have to investigate this option to understand the traffic flow in that case. The work you outlined below sounds really interesting and will resolve a lot of scalability problems. I am still working though the information you provided and will definitely keep a close eye on this.
I have been working with a team to develop BGP exporting of floating IP addresses with OVN using FRR running in a container. Our current implementation exports all floating IPs and provider VLAN IPs into BGP from each compute node in a DVR setup, which allows migration of floating IPs between compute nodes in a routed environment even if they do not share any layer 2 networks.
This will allow you to route traffic to multiple VMs (which can be Amphora load-balancers) using the same floating IP with IP Anycast, and the network will use route traffic to the nearest instance or load-balance with ECMP if there are multiple instances with the same number of hops in the path. This should work with allowed-address- pairs.
I will be presenting this solution at the OpenInfra Summit in Berlin along with Luis Tomás, and you can try out the ovn-bgp-agent with the code here:
https://github.com/luis5tb/bgp-agent
And he documents the design and testing environment in his blog:
https://ltomasbo.wordpress.com/2021/02/04/openstack-networking-with-bgp/
Currently ingress and egress traffic is routed via the kernel, so this setup doesn’t yet work with DPDK, however you can scale using multiple load balancers to compensate for that limitation. I am hopeful we will overcome that limitation before too long. The BGPd can also receive routes from BGP peers, but there is no need to receive all routes, one or more default routes would be sufficient to use ECMP and BFD for northbound traffic.
On the kubernetes side, I worked with the OpenShift engineers to add BGP support to MetalLB using FRR, so native load balancers in kubernetes can now export endpoint IPs into BGP routing fabric as well using a similar approach.
I know there has been a massive amount of interest in this approach in the last year, so I expect this to become a very popular architecture in the near future.
-Dan Sneddon
On Fri, May 6, 2022 at 2:45 PM Michael Johnson <johnsomor@gmail.com> wrote:
Hi Jan,
If I understand correctly, the issue you are facing is with ovs- dvr, the floating IPs are implemented in the SNAT namespace on the network node, causing a congestion point for high traffic. You are looking for a way to implement floating IPs that are distributed across your deployment and not concentrated in the network nodes.
Is that correct?
If so, I think what you are looking for is distributed floating IPs with OVN[1]. I will let the OVN experts confirm this.
Michael
[1] https://docs.openstack.org/networking-ovn/latest/admin/refarch/refarch.html#...
On Fri, May 6, 2022 at 6:19 AM Jan Horstmann <J.Horstmann@mittwald.de> wrote:
Hello!
When we initially deployed openstack we thought that using distributed virtual routing with ml2/ovs-dvr would give us the ability to automatically scale our network capacity with the number of hypervisors we use. Our main workload are kubernetes clusters which receive ingress traffic via octavia loadbancers (configured to use the amphora driver). So the idea was that we could increase the number of loadbalancers to spread the traffic over more and more compute nodes. This would imply that any volume based (distributed) denial of service attack on a single loadbalancer would just saturate a single compute node and leave the rest of the system functional.
We have recently learned that, no matter the loadbalancer topology, a virtual IP is created for it by octavia. This, and probably all virtual IPs in openstack, are reserved by an unbound and disabled port and then set as an allowed address pair on any server's port which might hold it. Up to this point our initial assumption should be true, as the server actually holding the virtual IP would reply to any ARP requests and thus any traffic should be routed to the node with the virtual machine of the octavia amphora. However, we are using our main provider network as a floating IP pool and do not allow direct port creation. When a floating IP is attached to the virtual IP it is assigned to the SNAT router namespace on a network node. Naturally in high traffic or even (distributed) denial of service situations the network node might become a bottleneck. A situation we thought we could avoid by using distributed virtual routing in the first place.
This leads me to a rabbit hole of questions I hope someone might be able to help with:
Is the assessment above correct or am I missing something?
If it is correct, do we have any other options than vertically scaling our network nodes to handle traffic? Do other ml2 drivers (e.g. OVN) handle this scenario differently?
If our network nodes need to handle most of the traffic anyway, do we still have any advantage using distributed virtual routing? Especially when considering the increased complexity compared to a non- distributed setup?
Has anyone ever explored non-virtual IP based high availability options like e.g. BGP multipathing in a distributed virtual routing scenario?
Any input is highly appreciated. Regards, Jan
-- Jan Horstmann Systementwickler | Infrastruktur _____ Mittwald CM Service GmbH & Co. KG Königsberger Straße 4-6 32339 Espelkamp Tel.: 05772 / 293-900 Fax: 05772 / 293-333 j.horstmann@mittwald.de https://www.mittwald.de Geschäftsführer: Robert Meyer, Florian Jürgens USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.
participants (3)
-
Dan Sneddon
-
Jan Horstmann
-
Michael Johnson