[ops][octavia][neutron] Distributed Virtual Routing: Floating IPs attached to virtual IP addresses are assigned on network nodes

6 May 2022

      Hello!

When we initially deployed openstack we thought that using distributed
virtual routing with ml2/ovs-dvr would give us the ability to
automatically scale our network capacity with the number of
hypervisors we use. Our main workload are kubernetes clusters which
receive ingress traffic via octavia loadbancers (configured to use the
amphora driver). So the idea was that we could increase the number of
loadbalancers to spread the traffic over more and more compute nodes.
This would imply that any volume based (distributed) denial of service
attack on a single loadbalancer would just saturate a single compute
node and leave the rest of the system functional.

We have recently learned that, no matter the loadbalancer topology, a
virtual IP is created for it by octavia. This, and probably all
virtual IPs in openstack, are reserved by an unbound and disabled port
and then set as an allowed address pair on any server's port which
might hold it.
Up to this point our initial assumption should be true, as the server
actually holding the virtual IP would reply to any ARP requests and
thus any traffic should be routed to the node with the virtual machine
of the octavia amphora.
However, we are using our main provider network as a floating IP pool
and do not allow direct port creation. When a floating IP is attached
to the virtual IP it is assigned to the SNAT router namespace on a
network node. Naturally in high traffic or even (distributed) denial
of service situations the network node might become a bottleneck. A
situation we thought we could avoid by using distributed virtual
routing in the first place.

This leads me to a rabbit hole of questions I hope someone might be
able to help with:

Is the assessment above correct or am I missing something?

If it is correct, do we have any other options than vertically scaling
our network nodes to handle traffic? Do other ml2 drivers (e.g. OVN)
handle this scenario differently?

If our network nodes need to handle most of the traffic anyway, do we
still have any advantage using distributed virtual routing? Especially
when considering the increased complexity compared to a non-
distributed setup?

Has anyone ever explored non-virtual IP based high availability
options like e.g. BGP multipathing in a distributed virtual routing
scenario?

Any input is highly appreciated.
Regards,
Jan

Jan Horstmann

Michael Johnson

Dan Sneddon

Jan Horstmann

tags

participants (3)