<div dir="ltr">A few comments inline.<div><br></div><div>Generally speaking the only thing I'd like to remark is that this use case makes sense independently of whether you are using overlay, or any other "SDN" solution (whatever SDN means to you).</div><div><br></div><div>Also, please note that this thread is now split in two - there's a new branch starting with Ian's post. So perhaps let's make two threads.</div><div><div class="gmail_extra"><br><div class="gmail_quote">On 21 July 2015 at 14:21, Neil Jerram <span dir="ltr"><<a href="mailto:Neil.Jerram@metaswitch.com" target="_blank">Neil.Jerram@metaswitch.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 20/07/15 18:36, Carl Baldwin wrote:<br>

> I'm looking for feedback from anyone interest but, in particular, I'd<br>

> like feedback from the following people for varying perspectives:<br>

> Mark McClain (proposed alternate), John Belamaric (IPAM), Ryan Tidwell<br>

> (BGP), Neil Jerram (L3 networks), Aaron Rosen (help understand<br>

> multi-provider networks) and you if you're reading this list of names<br>

> and thinking "he forgot me!"<br>

><br>

> We have been struggling to develop a way to model a network which is<br>

> composed of disjoint L2 networks connected by routers.  The intent of<br>

> this email is to describe the two proposals and request input on the<br>

> two in attempt to choose a direction forward.  But, first:<br>

> requirements.<br>

><br>

> Requirements:<br>

><br>

> The network should appear to end users as a single network choice.<br>

> They should not be burdened with choosing between segments.  It might<br>

> interest them that L2 communications may not work between instances on<br>

> this network but that is all.  </span></blockquote><div><br></div><div>It is however important to ensure services like DHCP keep working as usual.</div><div>Treating segments as logical networks in their own right is the simples solution to achieve this imho.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">This has been requested by numerous<br>

> operators [1][4].  It can be useful for external networks and provider<br>

> networks.<br>

><br>

> The model needs to be flexible enough to support two distinct types of<br>

> addresses:  1) address blocks which are statically bound to a single<br>

> segment and 2) address blocks which are mobile across segments using<br>

> some sort of dynamic routing capability like BGP or programmatically<br>

> injecting routes in to the infrastructure's routers with a plugin.<br>

<br>

</span>FWIW, I hadn't previously realized (2) here.<br></blockquote><div><br></div><div>A "mobile address block" translates to a subnet whose network association might change.</div><div>Achieving mobile address block does not seem simple to me at all. Route injection (booring) and BGP might solve the networking aspect of the problem, but we'd need also coordination with the compute service to ensure also all the workloads using addresses from the mobile block migrate; unless I've not understood the way these mobile address blocks work, I struggle to see this as a requirement.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div><div class="h5"><br>

><br>

> Overlay networks are not the answer to this.  The goal of this effort<br>

> is to scale very large networks with many connected ports by doing L3<br>

> routing (e.g. to the top of rack) instead of using a large continuous<br>

> L2 fabric.  </div></div></blockquote><div><br></div><div>As a side note, I find interesting that overlays where indeed proposed as a solution to avoid hybrid L2/L3 networks or having to span VLANs across the core and aggregation layers.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">Also, the operators interested in this work do not want<br>

> the complexity of overlay networks [4].<br>

><br>

> Proposal 1:<br>

><br>

> We refined this model [2] at the Neutron mid-cycle a couple of weeks<br>

> ago.  This proposal has already resonated reasonably with operators,<br>

> especially those from GoDaddy who attended the Neutron sprint.  Some<br>

> key parts of this proposal are:<br>

><br>

> 1.  The routed super network is called a front network.  The segments<br>

> are called back(ing) networks.<br>

> 2.  Backing networks are modeled as admin-owned private provider<br>

> networks but otherwise are full-blown Neutron networks.<br>

> 3.  The front network is marked with a new provider type.<br>

> 4.  A Neutron router is created to link the backing networks with<br>

> internal ports.  It represents the collective routing ability of the<br>

> underlying infrastructure.<br>

> 5.  Backing networks are associated with a subset of hosts.<br>

> 6.  Ports created on the front network must have a host binding and<br>

> are actually created on a backing network when all is said and done.<br>

> They carry the ID of the backing network in the DB.<br></div></div></blockquote><div><br></div><div>While the logical model and workflow you describe here makes sense, I have the impression that:</div><div>1) The front network is not a neutron logical network. Because it does not really behave like a network, with the only exception that you can pass its id to the nova API. To reinforce this consider that basically the front network has no ports.</div><div>2) from a topological perspective the front network "kind of" behaves like an external network; but it isn't. The front network is not really a common gateway for all backing networks, more like a label which is attached to the router which interconnects all the backing networks.</div><div>3) more on topology. How can we know that all these segments will always be connected by a single logical router? Using static router (or If one day BGP will be a thing), it is already possible to implement multi-segments networks with L3 connectivity using multiple logical routers, isn't it?</div><div>4) Point #5 is making assumptions on network aware scheduling. I am not sure we already have the ability to inform the nova scheduler to deploy an instance on a host where a give network is available.</div><div>5) I think that I would treat the "front" network as a "network group" or "cluster". I noticed the term "subnet cluster" is used in the etherpad. I find this term appropriate because it seems to me that in this scenario the final user does not care at all about the network intended as a L2 segment.</div><div>6) It seems one of the purposes of using backing networks is to identify an address block for the ports being created. But then how would that play with mobile address blocks? From an instance workflow perspective, should instances be associated with one or more address blocks at boot time?</div><div>7) What happens is a user attaches a router to a backing network and connect that router to an external network? Does that becomes a gateway for all backing networks or just for that network? And would the workflow be for uplinking a front network to an external network?</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">

><br>

> Using Neutron networks to model the segments allows us to fully<br>

> specify the details of each network using the regular Neutron model.<br>

> They could be heterogeneous or homogeneous, it doesn't matter.<br>

<br>

</div></div>You've probably seen Robert Kukura's comment on the related bug at<br>

<a href="https://bugs.launchpad.net/neutron/+bug/1458890/comments/30" rel="noreferrer" target="_blank">https://bugs.launchpad.net/neutron/+bug/1458890/comments/30</a>, and there<br>

is a useful detailed description of how the multiprovider extension<br>

works at<br>

<a href="https://bugs.launchpad.net/openstack-api-site/+bug/1242019/comments/3" rel="noreferrer" target="_blank">https://bugs.launchpad.net/openstack-api-site/+bug/1242019/comments/3</a>.<br>

I believe it is correct to say that using multiprovider would be an<br>

effective substitute for using multiple backing networks with different<br>

{network_type, physical_network, segmentation_id}, and that logically<br>

multiprovider is aiming to describe the same thing as this email thread<br>

is, i.e. non-overlay mapping onto a physical network composed of<br>

multiple segments.<br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

However, I believe multiprovider does not (per se) address the IP<br>

addressing requirement(s) of the multi-segment scenario.<br></blockquote><div><br></div><div>Indeed it does not. The multiprovider extension simply indicates that a network can be built using different L2 segments.</div><div>It is then up to the operator to ensure that these segments are correct, and it's up to whatever is running in the backend to ensure that instances on the various segments can communicate each other.</div><div><br></div><div>I believe the ask here is for Neutron to provide this capability (the neutron reference control plane currently doesn't). It is not yet entirely clear to me whether there's a real need of changing the logical model, but IP addressing implications might be a reason, as pointed out by Neil.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

><br>

> This proposal offers a clear separation between the statically bound<br>

> and the mobile address blocks by associating the former with the<br>

> backing networks and the latter with the front network.  The mobile<br>

> addresses are modeled just like floating IPs are today but are<br>

> implemented by some plugin code (possibly without NAT).<br>

<br>

</span>Couldn't the mobile addresses be _exactly_ like floating IPs already<br>

are?  Why is anything different from floating IPs needed here?<br>

<span class=""><br>

><br>

> This proposal also provides some advantages for integrating dynamic<br>

> routing.  Since each backing network will, by necessity, have a<br>

> corresponding router in the infrastructure, the relationship between<br>

> dynamic routing speaker, router, and network is clear in the model:<br>

> network <-> speaker <-> router.<br></span></blockquote><div><br></div><div>Ok. But how that changes because of backing networks? I believe the same relationship holds true for every network, or am I wrong?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

<br>

</span>I'm not sure exactly what you mean here by 'dynamic routing', but I<br>

think this touches on a key point: can IP routing happen anywhere in a<br>

Neutron network, without being explicitly represented by a router object<br>

in the model?<br>

<br>

I think the answer to that should be yes.  </blockquote><div><br></div><div>But this would also mean that we should consider doing without the very concept of router in Neutron.</div><div>If we look at the scenarios we're describing here, I'd agree with you, but unfortunately Neutron is required to serve a wide variety of scenarios.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It clearly already is in the<br>

underlay if you are using tunnels - the tunnel between two compute hosts<br>

may require multiple IP hops across the fabric.  At the network level<br>

that Neutron networks currently model, the answer is currently no, but I<br>

think it's interesting to consider changing that.<br>

<span class=""><br>

><br>

> Proposal 2:<br>

><br>

> This alternate model has not been fully fleshed out.<br>

<br>

</span>I should begin by admitting the blame here.  Much of this is a<br>

half-baked idea from me, that I haven't yet had time to explore<br>

properly.  However....<br>

<span class=""><br>

>   Some parts of it<br>

> are still unclear to me.  The basic idea is to give the IPAM system<br>

> information about IP availability on a given host.  When creating a<br>

> port, the binding information would be sent to the IPAM system and the<br>

> system would choose an appropriate address block for the allocation.<br></span></blockquote><div><br></div><div>To make a link to proposal #1, I read this as informing the IPAM system of which baking network(s) can be implemented on the host which has been selected.</div><div>But I am not 100% convinced that the two proposals implement the same workflow.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

<br>

</span>Right.  A key requirement, for this to be possible, is that Nova's host<br>

selection happens before the IPAM system is asked to allocate an IP<br>

address.  I have an action to investigate that, but if anyone happens to<br>

know already, please do say.<br></blockquote><div><br></div><div>I am 99.99% sure this is not possible at the moment unless something is done to make nova scheduler network aware.</div><div>Also, this will add a point of coupling between the instance boot and network provisioning processes, which are independent at the moment.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

><br>

> 1. This alternate model offers no way to distinguish the two types of<br>

> address blocks.<br>

<br>

</span>Agreed.  But I wonder if normal floating IPs can be used for the mobile<br>

IP addresses (as also suggested above).<br></blockquote><div><br></div><div>I get the concept, but it's not really a floating IP in neutron terms, as that implies SNAT/DNAT.</div><div>Also, from what I gather it's not about single mobile addresses, but we're talking about entire subnets that can be moved around.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

> 2. We don't have the benefit of modeling the segments with Neutron networks.<br>

<br>

</span>Agreed, but it appears that multiprovider has already taken a different<br>

view here, and already provides the ability for a network to map to<br>

multiple {network_type, physical_network, segmentation_id} tuples.<br></blockquote><div><br></div><div>Modelling segments as logical networks is not necessarily a benefit in my opinion;</div><div>it's more a convenience. For instance the reference control plane might implement provider networks in a way such that:</div><div>1) a "ghost router" is created in the l3 agent to ensure E-W traffic across all segments (the router is "ghost" because it's not exposed as neutron logical router</div><div>2) a distinct dnsmasq instance is started on every segment of the network to ensure DHCP functionality</div><div>3) metadata services can be provided through the ghost router rather than using isolated metadata</div><div><br></div><div>I think this alternative is worth exploring anyway.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class=""><br>

><br>

> It was suggested that hierarchical port binding could help here but I<br>

> see it as orthogonal to this.  Hierarchical port binding extends the<br>

> L2 properties of a port to a hierarchical infrastructure to achieve<br>

> continuous L2 connectivity.  It is also intended for overlay networks.<br>

> That isn't what we're doing here and I don't think it fits.<br>

><br>

> I have also considered the multi-provider extension [3] for this.<br>

> This is not yet clear to me either.  First, my understanding was that<br>

> this extension describes multi-segment continuous L2 fabrics.<br>

<br>

</span><a href="https://bugs.launchpad.net/openstack-api-site/+bug/1242019/comments/3" rel="noreferrer" target="_blank">https://bugs.launchpad.net/openstack-api-site/+bug/1242019/comments/3</a> says:<br>

<br>

"Note that, although ML2 can manage binding to multi-segment networks,<br>

neutron does not manage bridging between the segments of a multi-segment<br>

network. This is assumed to be done administratively."<br>

<br>

So I think it is not intended for a multiprovider network to be<br>

"continuous".<br>

<br>

Again, this touches on the point above about routing happening without<br>

being explicitly represented in the Neutron model...<br>

<span class=""><br>

>   Second,<br>

> there doesn't seem to be any host binding aspect to the multi-provider<br>

> extension.  Third, not all L2 plugins support this extension.  It<br>

> seems silly to require L2 plugin support in order to enable routing<br>

> between segments.<br>

<br>

</span>Good point.  If all plugins required the same kind of transformation to<br>

support multiprovider, perhaps that's telling us that the multi-ness<br>

should instead be in a layer above, more like your proposal 1.<br>

<span class=""><br>

><br>

> It isn't clear to me how a dynamic routing speaker will fit in to this<br>

> model.  My first thought is that it must be integrated with IPAM<br>

> because the IPAM system has the understanding of how to map address<br>

> blocks to infrastructure.  This pushes even more infrastructure<br>

> knowledge down to the IPAM system.  If dynamic routing is pushed down<br>

> to the IPAM system, it will also be necessary to push the association<br>

> of mobile IPs or routed tenant subnets down in to the IPAM system too.<br>

> This means Neutron needs to tell IPAM about every floating IP<br>

> association and every tenant subnet behind a Neutron router in the<br>

> same address scope as the external network.  I'm not convinced that<br>

> IPAM and routing really belong together like this.<br>

<br>

</span>I'm afraid I don't yet sufficiently understand the 'dynamic routing'<br>

requirements here.  Can you say more about them?<br>

<span class=""><br>

><br>

> If you made it this far in this email, you must have some feedback.<br>

> Please help us out.<br>

<br>

</span>There are a lot of moving parts here.  I'm afraid I don't yet see any<br>

clarity, but perhaps if we talk about this enough, that will eventually<br>

emerge!<br>

<br>

Regards,<br>

    Neil<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</div></div></blockquote></div><br></div></div><div class="gmail_extra"><br></div></div>