[openstack-dev] [Neutron] [RFC] Floating IP idea solicitation and collaboration

Thomas Morin thomas.morin at orange.com
Tue Dec 16 13:40:08 UTC 2014

Hi Keshava,

2014-12-15 11:52, A, Keshava :
> 	I have been thinking of "Starting MPLS right from CN" for L2VPN/EVPN scenario also.
> 	Below are my queries w.r.t supporting MPLS from OVS :
> 		1. MPLS will be used even for VM-VM traffic across CNs generated by OVS  ?

If E-VPN is used only to interconnect outside of a Neutron domain, then 
MPLS does not have to be used for traffic between VMs.

If E-VPN is used inside one DC for VM-VM traffic, then MPLS is *one* of 
the possible encapsulation only: E-VPN specs have been defined to use 
VXLAN (handy because there is native kernel support), MPLS/GRE or 
MPLS/UDP are other possibilities.

> 		2. MPLS will be originated right from OVS and will be mapped at Gateway (it may be NN/Hardware router ) to SP network ?
> 			So MPLS will carry 2 Labels ? (one for hop-by-hop, and other one for end to identify network ?)

On "will carry 2 Labels ?" : this would be one possibility, but not the 
one we target.
We would actually favor MPLS/GRE (GRE used instead of what you call the 
MPLS "hop-by-hop" label) inside the DC -- this requires only one label.
At the DC edge gateway, depending on the interconnection techniques to 
connect the WAN, different options can be used (RFC4364 section 10): 
Option A with back-to-back VRFs (no MPLS label, but typically VLANs), or 
option B (with one MPLS label), a mix of A/B is also possible and 
sometimes called option D (one label) ;  option C also exists, but is 
not a good fit here.

Inside one DC, if vswitches see each other across an Ethernet segment, 
we can also use MPLS with just one label (the VPN label) without a GRE 

In a way, you can say that in Option B, the label are "mapped" at the 
DC/WAN gateway(s), but this is really just MPLS label swaping, not to be 
misunderstood as mapping a DC label space to a WAN label space (see 
below, the label space is local to each device).

> 		3. MPLS will go over even the "network physical infrastructure"  also ?

The use of MPLS/GRE means we are doing an overlay, just like your 
typical VXLAN-based solution, and the network physical infrastructure 
does not need to be MPLS-aware (it just needs to be able to carry IP 

> 		4. How the Labels will be mapped a/c virtual and physical world ?

(I don't get the question, I'm not sure what you mean by "mapping labels")

> 		5. Who manages the label space  ? Virtual world or physical world or both ? (OpenStack +  ODL ?)

In MPLS*, the label space is local to each device : a label is 
"downstream-assigned", i.e. allocated by the receiving device for a 
specific purpose (e.g. forwarding in a VRF). It is then (typically) 
avertized in a routing protocol; the sender device will use this label 
to send traffic to the receiving device for this specific purpose.  As a 
result a sender device may then use label 42 to forward traffic in the 
context of VPN X to a receiving device A, and the same label 42 to 
forward traffic in the context of another VPN Y to another receiving 
device B, and locally use label 42 to receive traffic for VPN Z.  There 
is no global label space to manage.

So, while you can design a solution where the label space is managed in 
a centralized fashion, this is not required.

You could design an SDN controller solution where the controller would 
manage one label space common to all nodes, or all the label spaces of 
all forwarding devices, but I think its hard to derive any interesting 
property from such a design choice.

In our BaGPipe distributed design (and this is also true in OpenContrail 
for instance) the label space is managed locally on each compute node 
(or network node if the BGP speaker is on a network node). More 
precisely in VPN implementation.

If you take a step back, the only naming space that has to be "managed" 
in BGP VPNs is the Route Target space. This is only in the control 
plane. It is a very large space (48 bits), and it is structured (each AS 
has its own 32 bit space, and there are private AS numbers). The mapping 
to the dataplane to MPLS labels is per-device and purely local.

(*: MPLS also allows "upstream-assigned" labels, it is more recent and 
only used in specific cases where downstream assigned does not work well)

> 		6. The labels are nested (i.e. Like L3 VPN end to end MPLS connectivity ) will be established ?

In solutions where MPLS/GRE is used the label stack typically has only 
one label (the VPN label).

> 		7. Or it will be label stitching between Virtual-Physical network ?
> 	How the end-to-end path will be setup ?
> Let me know your opinion for the same.

How the end-to-end path is setup may depend on interconnection choice.
With an inter-AS option B or A+B, you would have the following:
- ingress DC overlay: one MPLS-over-GRE hop from vswitch to DC edge
- ingress DC edge to WAN: one MPLS label (VPN label advertised by eBGP)
- inside the WAN: (typically) two labels (e.g. LDP label to reach remote 
edge, and VPN label advertised via iBGP)
- WAN to  edgress DC edge: one MPLS label (VPN label advertised by eBGP)
- egress DC overlay: one MPLS-over-GRE hop from DC edge to vswitch

Not sure how the above answers your questions; please keep asking if it 
does not !  ;)


> -----Original Message-----
> From: Mathieu Rohon [mailto:mathieu.rohon at gmail.com]
> Sent: Monday, December 15, 2014 3:46 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Neutron] [RFC] Floating IP idea solicitation and collaboration
> Hi Ryan,
> We have been working on similar Use cases to announce /32 with the Bagpipe BGPSpeaker that supports EVPN.
> Please have a look at use case B in [1][2].
> Note also that the L2population Mechanism driver for ML2, that is compatible with OVS, Linuxbridge and ryu ofagent, is inspired by EVPN, and I'm sure it could help in your use case
> [1]http://fr.slideshare.net/ThomasMorin1/neutron-and-bgp-vpns-with-bagpipe
> [2]https://www.youtube.com/watch?v=q5z0aPrUZYc&sns
> [3]https://blueprints.launchpad.net/neutron/+spec/l2-population
> Mathieu
> On Thu, Dec 4, 2014 at 12:02 AM, Ryan Clevenger <ryan.clevenger at rackspace.com> wrote:
>> Hi,
>> At Rackspace, we have a need to create a higher level networking
>> service primarily for the purpose of creating a Floating IP solution
>> in our environment. The current solutions for Floating IPs, being tied
>> to plugin implementations, does not meet our needs at scale for the following reasons:
>> 1. Limited endpoint H/A mainly targeting failover only and not
>> multi-active endpoints, 2. Lack of noisy neighbor and DDOS mitigation,
>> 3. IP fragmentation (with cells, public connectivity is terminated
>> inside each cell leading to fragmentation and IP stranding when cell
>> CPU/Memory use doesn't line up with allocated IP blocks. Abstracting
>> public connectivity away from nova installations allows for much more
>> efficient use of those precious IPv4 blocks).
>> 4. Diversity in transit (multiple encapsulation and transit types on a
>> per floating ip basis).
>> We realize that network infrastructures are often unique and such a
>> solution would likely diverge from provider to provider. However, we
>> would love to collaborate with the community to see if such a project
>> could be built that would meet the needs of providers at scale. We
>> believe that, at its core, this solution would boil down to
>> terminating north<->south traffic temporarily at a massively
>> horizontally scalable centralized core and then encapsulating traffic
>> east<->west to a specific host based on the association setup via the current L3 router's extension's 'floatingips'
>> resource.
>> Our current idea, involves using Open vSwitch for header rewriting and
>> tunnel encapsulation combined with a set of Ryu applications for management:
>> https://i.imgur.com/bivSdcC.png
>> The Ryu application uses Ryu's BGP support to announce up to the
>> Public Routing layer individual floating ips (/32's or /128's) which
>> are then summarized and announced to the rest of the datacenter. If a
>> particular floating ip is experiencing unusually large traffic (DDOS,
>> slashdot effect, etc.), the Ryu application could change the
>> announcements up to the Public layer to shift that traffic to
>> dedicated hosts setup for that purpose. It also announces a single /32
>> "Tunnel Endpoint" ip downstream to the TunnelNet Routing system which
>> provides transit to and from the cells and their hypervisors. Since
>> traffic from either direction can then end up on any of the FLIP
>> hosts, a simple flow table to modify the MAC and IP in either the SRC
>> or DST fields (depending on traffic direction) allows the system to be
>> completely stateless. We have proven this out (with static routing and
>> flows) to work reliably in a small lab setup.
>> On the hypervisor side, we currently plumb networks into separate OVS
>> bridges. Another Ryu application would control the bridge that handles
>> overlay networking to selectively divert traffic destined for the
>> default gateway up to the FLIP NAT systems, taking into account any
>> configured logical routing and local L2 traffic to pass out into the
>> existing overlay fabric undisturbed.
>> Adding in support for L2VPN EVPN
>> (https://tools.ietf.org/html/draft-ietf-l2vpn-evpn-11) and L2VPN EVPN
>> Overlay (https://tools.ietf.org/html/draft-sd-l2vpn-evpn-overlay-03)
>> to the Ryu BGP speaker will allow the hypervisor side Ryu application
>> to advertise up to the FLIP system reachability information to take
>> into account VM failover, live-migrate, and supported encapsulation
>> types. We believe that decoupling the tunnel endpoint discovery from
>> the control plane
>> (Nova/Neutron) will provide for a more robust solution as well as
>> allow for use outside of openstack if desired.

More information about the OpenStack-dev mailing list