[openstack-dev] [Neutron] [RFC] Floating IP idea solicitation and collaboration

A, Keshava keshava.a at hp.com
Thu Dec 18 09:31:26 UTC 2014

Hi  Thomas,

Basically as per your thought, extend the 'vpn-label' to OVS itself.
So that, when  MPLS-over-GRE packet comes from OVS , use that  incoming label to index respective VPN table at DC-Edge side ?

	1. Who tells which label to use to OVS ? 
	You are thinking to have BGP-VPN session between DC-Edge to Compute Node(OVS) ? 
	So that there it self-look at the BGP-VPN table and based on destination add that VPN label as MPLS label  in OVS ?
	 ODL or OpenStack controller will dictate  which VPN label to use to both DC-Edge and CN(ovs)?

	2. How much will be the gain/advantage by generating the mpls from OVS ? (compare the terminating VxLAN on DC-edge and then originating the mpls from there ?)


-----Original Message-----
From: Thomas Morin [mailto:thomas.morin at orange.com] 
Sent: Tuesday, December 16, 2014 7:10 PM
To: openstack-dev at lists.openstack.org
Subject: Re: [openstack-dev] [Neutron] [RFC] Floating IP idea solicitation and collaboration

Hi Keshava,

2014-12-15 11:52, A, Keshava :
> 	I have been thinking of "Starting MPLS right from CN" for L2VPN/EVPN scenario also.
> 	Below are my queries w.r.t supporting MPLS from OVS :
> 		1. MPLS will be used even for VM-VM traffic across CNs generated by OVS  ?

If E-VPN is used only to interconnect outside of a Neutron domain, then MPLS does not have to be used for traffic between VMs.

If E-VPN is used inside one DC for VM-VM traffic, then MPLS is *one* of the possible encapsulation only: E-VPN specs have been defined to use VXLAN (handy because there is native kernel support), MPLS/GRE or MPLS/UDP are other possibilities.

> 		2. MPLS will be originated right from OVS and will be mapped at Gateway (it may be NN/Hardware router ) to SP network ?
> 			So MPLS will carry 2 Labels ? (one for hop-by-hop, and other one 
> for end to identify network ?)

On "will carry 2 Labels ?" : this would be one possibility, but not the one we target.
We would actually favor MPLS/GRE (GRE used instead of what you call the MPLS "hop-by-hop" label) inside the DC -- this requires only one label.
At the DC edge gateway, depending on the interconnection techniques to connect the WAN, different options can be used (RFC4364 section 10): 
Option A with back-to-back VRFs (no MPLS label, but typically VLANs), or option B (with one MPLS label), a mix of A/B is also possible and sometimes called option D (one label) ;  option C also exists, but is not a good fit here.

Inside one DC, if vswitches see each other across an Ethernet segment, we can also use MPLS with just one label (the VPN label) without a GRE encap.

In a way, you can say that in Option B, the label are "mapped" at the DC/WAN gateway(s), but this is really just MPLS label swaping, not to be misunderstood as mapping a DC label space to a WAN label space (see below, the label space is local to each device).

> 		3. MPLS will go over even the "network physical infrastructure"  also ?

The use of MPLS/GRE means we are doing an overlay, just like your typical VXLAN-based solution, and the network physical infrastructure does not need to be MPLS-aware (it just needs to be able to carry IP

> 		4. How the Labels will be mapped a/c virtual and physical world ?

(I don't get the question, I'm not sure what you mean by "mapping labels")

> 		5. Who manages the label space  ? Virtual world or physical world or 
> both ? (OpenStack +  ODL ?)

In MPLS*, the label space is local to each device : a label is "downstream-assigned", i.e. allocated by the receiving device for a specific purpose (e.g. forwarding in a VRF). It is then (typically) avertized in a routing protocol; the sender device will use this label to send traffic to the receiving device for this specific purpose.  As a result a sender device may then use label 42 to forward traffic in the context of VPN X to a receiving device A, and the same label 42 to forward traffic in the context of another VPN Y to another receiving device B, and locally use label 42 to receive traffic for VPN Z.  There is no global label space to manage.

So, while you can design a solution where the label space is managed in a centralized fashion, this is not required.

You could design an SDN controller solution where the controller would manage one label space common to all nodes, or all the label spaces of all forwarding devices, but I think its hard to derive any interesting property from such a design choice.

In our BaGPipe distributed design (and this is also true in OpenContrail for instance) the label space is managed locally on each compute node (or network node if the BGP speaker is on a network node). More precisely in VPN implementation.

If you take a step back, the only naming space that has to be "managed" 
in BGP VPNs is the Route Target space. This is only in the control plane. It is a very large space (48 bits), and it is structured (each AS has its own 32 bit space, and there are private AS numbers). The mapping to the dataplane to MPLS labels is per-device and purely local.

(*: MPLS also allows "upstream-assigned" labels, it is more recent and only used in specific cases where downstream assigned does not work well)

> 		6. The labels are nested (i.e. Like L3 VPN end to end MPLS connectivity ) will be established ?

In solutions where MPLS/GRE is used the label stack typically has only one label (the VPN label).

> 		7. Or it will be label stitching between Virtual-Physical network ?
> 	How the end-to-end path will be setup ?
> Let me know your opinion for the same.

How the end-to-end path is setup may depend on interconnection choice.
With an inter-AS option B or A+B, you would have the following:
- ingress DC overlay: one MPLS-over-GRE hop from vswitch to DC edge

	Label coming from vSwitch is considered to select the respective VPN instance.
	But someone should tell which label to use to which  VPN instance at OVS side right ?

- ingress DC edge to WAN: one MPLS label (VPN label advertised by eBGP)
- inside the WAN: (typically) two labels (e.g. LDP label to reach remote 
edge, and VPN label advertised via iBGP)
- WAN to  edgress DC edge: one MPLS label (VPN label advertised by eBGP)
- egress DC overlay: one MPLS-over-GRE hop from DC edge to vswitch

Not sure how the above answers your questions; please keep asking if it 
does not !  ;)


> -----Original Message-----
> From: Mathieu Rohon [mailto:mathieu.rohon at gmail.com]
> Sent: Monday, December 15, 2014 3:46 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Neutron] [RFC] Floating IP idea solicitation and collaboration
> Hi Ryan,
> We have been working on similar Use cases to announce /32 with the Bagpipe BGPSpeaker that supports EVPN.
> Please have a look at use case B in [1][2].
> Note also that the L2population Mechanism driver for ML2, that is compatible with OVS, Linuxbridge and ryu ofagent, is inspired by EVPN, and I'm sure it could help in your use case
> [1]http://fr.slideshare.net/ThomasMorin1/neutron-and-bgp-vpns-with-bagpipe
> [2]https://www.youtube.com/watch?v=q5z0aPrUZYc&sns
> [3]https://blueprints.launchpad.net/neutron/+spec/l2-population
> Mathieu
> On Thu, Dec 4, 2014 at 12:02 AM, Ryan Clevenger <ryan.clevenger at rackspace.com> wrote:
>> Hi,
>> At Rackspace, we have a need to create a higher level networking
>> service primarily for the purpose of creating a Floating IP solution
>> in our environment. The current solutions for Floating IPs, being tied
>> to plugin implementations, does not meet our needs at scale for the following reasons:
>> 1. Limited endpoint H/A mainly targeting failover only and not
>> multi-active endpoints, 2. Lack of noisy neighbor and DDOS mitigation,
>> 3. IP fragmentation (with cells, public connectivity is terminated
>> inside each cell leading to fragmentation and IP stranding when cell
>> CPU/Memory use doesn't line up with allocated IP blocks. Abstracting
>> public connectivity away from nova installations allows for much more
>> efficient use of those precious IPv4 blocks).
>> 4. Diversity in transit (multiple encapsulation and transit types on a
>> per floating ip basis).
>> We realize that network infrastructures are often unique and such a
>> solution would likely diverge from provider to provider. However, we
>> would love to collaborate with the community to see if such a project
>> could be built that would meet the needs of providers at scale. We
>> believe that, at its core, this solution would boil down to
>> terminating north<->south traffic temporarily at a massively
>> horizontally scalable centralized core and then encapsulating traffic
>> east<->west to a specific host based on the association setup via the current L3 router's extension's 'floatingips'
>> resource.
>> Our current idea, involves using Open vSwitch for header rewriting and
>> tunnel encapsulation combined with a set of Ryu applications for management:
>> https://i.imgur.com/bivSdcC.png
>> The Ryu application uses Ryu's BGP support to announce up to the
>> Public Routing layer individual floating ips (/32's or /128's) which
>> are then summarized and announced to the rest of the datacenter. If a
>> particular floating ip is experiencing unusually large traffic (DDOS,
>> slashdot effect, etc.), the Ryu application could change the
>> announcements up to the Public layer to shift that traffic to
>> dedicated hosts setup for that purpose. It also announces a single /32
>> "Tunnel Endpoint" ip downstream to the TunnelNet Routing system which
>> provides transit to and from the cells and their hypervisors. Since
>> traffic from either direction can then end up on any of the FLIP
>> hosts, a simple flow table to modify the MAC and IP in either the SRC
>> or DST fields (depending on traffic direction) allows the system to be
>> completely stateless. We have proven this out (with static routing and
>> flows) to work reliably in a small lab setup.
>> On the hypervisor side, we currently plumb networks into separate OVS
>> bridges. Another Ryu application would control the bridge that handles
>> overlay networking to selectively divert traffic destined for the
>> default gateway up to the FLIP NAT systems, taking into account any
>> configured logical routing and local L2 traffic to pass out into the
>> existing overlay fabric undisturbed.
>> Adding in support for L2VPN EVPN
>> (https://tools.ietf.org/html/draft-ietf-l2vpn-evpn-11) and L2VPN EVPN
>> Overlay (https://tools.ietf.org/html/draft-sd-l2vpn-evpn-overlay-03)
>> to the Ryu BGP speaker will allow the hypervisor side Ryu application
>> to advertise up to the FLIP system reachability information to take
>> into account VM failover, live-migrate, and supported encapsulation
>> types. We believe that decoupling the tunnel endpoint discovery from
>> the control plane
>> (Nova/Neutron) will provide for a more robust solution as well as
>> allow for use outside of openstack if desired.

OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org

More information about the OpenStack-dev mailing list