<div dir="ltr">On 21 July 2015 at 12:11, John Belamaric <span dir="ltr"><<a href="mailto:jbelamaric@infoblox.com" target="_blank">jbelamaric@infoblox.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word">
Wow, a lot to digest in these threads. If I can summarize my understanding of the two proposals. Let me know whether I get this right. There are a couple problems that need to be solved:
<div><br>
</div>
<div> a. Scheduling based on host reachability to the segments</div></div></blockquote><div><br></div><div>So actually this is something Assaf and I were debating on IRC, and I think it depends what you're aiming for.<br><br></div><div>Imagine you have connectivity for a 'network' to every host, but that connectivity only works if you get a host-specific address because the address range is different per host. This seems to be the use case we come back to. (There's a corner case of this where the network is not available on every host and that gets you different requirements, but for now, this.)<br><br>You *can* use the current mechanism: allocate address, schedule, run - providing your scheduler respects the address you've been allocated and puts you on a host that can reach this address. This is a silly approach. You can't tell when getting the address (for a port that is entirely disassociated from the VM it's going to be attached to, via Neutron when most of the scheduling constraints live in Nova) that the address is on a machine that can even run the VM.<br><br></div><div>You can delay address allocation - then the machine can be scheduled anywhere because the address it has is not a constraint. This saves any change to scheduling at all - normal scheduling rules apply, excepting the case where addresses are exhausted on that machine, and in that case we'd probably use the retry mechanism as a fallback to find a better place until someone works out it's not really just a *nova* scheduler.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word">
<div> b. Floating IP functionality across the segments. I am not sure I am clear on this one but it sounds like you want the routers attached to the segments to advertise routes to the specific floating IPs. Presumably then they would do NAT or the
instance would assign both the fixed IP and the floating IP to its interface?</div></div></blockquote><div><br></div><div>That's the summary. And I don't think anyone is clear on this and I also don't know that anyone has specifically requested this.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word">
<div>In Proposal 1, (a) is solved by associating segments to the front network via a router - that association is used to provide a single hook into the existing API that limits the scope of segment selection to those associated with the front network.
(b) is solved by tying the floating IP ranges to the same front network and managing the reachability with dynamic routing.
</div><div><br>
</div>
<div>In Proposal 2, (a) is solved by tagging each network with some meta-data that the IPAM system uses to make a selection. </div></div></blockquote><div><br><div>The distinction is actually pretty small. The same backing data
exists for the IPAM to use - the difference is only that in (1) it's
there as a misuse of networks and in (2) it's not specified.<br></div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div>This implies an IP allocation request that passes something other than a network/port to the IPAM subsystem.</div></div></blockquote><br>This is where I started - there is nothing to pass when I run 'neutron port-create' except for a network and this is where address allocation happens today. We need a mechanism to defer address allocation and indicate that the port has no address right now.<br></div><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div> This fine from
the IPAM point of view but there is no corresponding API for this right now. To solve (b) either the IPAM system has to publish the routes </div></div></blockquote><div><br></div><div>It needs to ensure there's enough information on the port that the network controller can push the routes, is the way I think of it.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div>or the higher level management has to ALSO be aware of the mappings (rather than just IPAM).</div>
<div><br>
</div>
<div>To throw some fuel on the fire, I would argue also that (a) is not sufficient and address availability needs to be considered as well (as described in [1]). Selecting a host based on reachability alone will fail when addresses are exhausted. Similarly,
with (b) I think there needs to be consideration during association of a floating IP to the effect on routing. That is, rather than a huge number of host routes it would be ideal to allocate the floating IPs in blocks that can be associated with the backing
networks (though we would want to be able to split these blocks as small as a /32 if necessary - but avoid it/optimize as much as possible).</div></div></blockquote><div><br></div><div>Again - the scheduler is simplistic and nova-centric as things stand, and I think we all recgonise this. The current fallbacks work, but they're fallbacks.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word">
<div>In fact, I think that these proposals are more or less the same - it's just in #1 the meta-data used to tie the backing networks together is another network. </div></div></blockquote><div><br></div><div>Yup.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><div>This allows it to fit in neatly with the existing APIs. You would still need to implement
something prior to IPAM or within IPAM that would select the appropriate backing network.
</div><div><br>
</div>
<div>As a (gulp) third alternative, we should consider that the front network here is in essence a layer 3 domain, and we have modeled layer 3 domains as address scopes in Liberty. The user is essentially saying "give me an address that is routable
in this scope" - they don't care which actual subnet it gets allocated on. This is conceptually more in-line with [2] - modeling L3 domain separately from the existing Neutron concept of a network being a broadcast domain.</div></div></blockquote><div><br></div><div>Again, the issue is that when you ask for an address you tend to have quite a strong opinion of what that address should be if it's location-specific.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word">
<div><br>
</div>Fundamentally, however we associate the segments together, this comes down to a scheduling problem. </div></blockquote><div><br></div><div>It's not *solely* a scheduling problem, and that is my issue with this statement (Assaf has been saying the same). You *can* solve this *exclusively* with scheduling (allocate the address up front, hope that the address has space for a VM with all its constraints met) - but that solution is horrible; or you can solve this largely with allocation where scheduling helps to deal with pool exchaustion, where it is mainly another sort of problem but scheduling plays a part.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word">Nova needs to be able to incorporate data from Neutron in its scheduling decision. Rather than solving this with a single piece of meta-data like
network_id as described in proposal 1, it probably makes more sense to build out the general concept of utilizing network data for nova scheduling. We could still model this as in #1, or using address scopes, or some arbitrary data as in #2. But the harder
problem to solve is the scheduling, not how we tag these things to inform that scheduling.
<div><br>
</div>
<div>The optimization of routing for floating IPs is also a scheduling problem, though one that would require a lot more changes to how FIP are allocated and associated to solve.</div>
<div><br>
</div>
<div>John</div>
<div><br>
</div>
<div>[1] <a href="https://review.openstack.org/#/c/180803/" target="_blank">https://review.openstack.org/#/c/180803/</a></div>
<div>[2] <a href="https://bugs.launchpad.net/neutron/+bug/1458890/comments/7" target="_blank">https://bugs.launchpad.net/neutron/+bug/1458890/comments/7</a></div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
<div>
<div>
<blockquote type="cite"><div><div class="h5">
<div>On Jul 21, 2015, at 10:52 AM, Carl Baldwin <<a href="mailto:carl@ecbaldwin.net" target="_blank">carl@ecbaldwin.net</a>> wrote:</div>
<br>
</div></div><div><div><div class="h5">
<div dir="ltr">
<p dir="ltr">On Jul 20, 2015 4:26 PM, "Ian Wells" <<a href="mailto:ijw.ubuntu@cack.org.uk" target="_blank">ijw.ubuntu@cack.org.uk</a>> wrote:<br>
><br>
> There are two routed network models:<br>
><br>
> - I give my VM an address that bears no relation to its location and ensure the routed fabric routes packets there - this is very much the routing protocol method for doing things where I have injected a route into the network and it needs to propagate.
It's also pretty useless because there are too many host routes in any reasonable sized cloud.<br>
><br>
> - I give my VM an address that is based on its location, which only becomes apparent at binding time. This means that the semantics of a port changes - a port has no address of any meaning until binding, because its location is related to what it does -
and it leaves open questions about what to do when you migrate.<br>
><br>
> Now, you seem to generally be thinking in terms of the latter model, particularly since the provider network model you're talking about fits there. But then you say:</p>
<p dir="ltr">Actually, both. For example, GoDaddy assigns each vm an ip from the location based address blocks and optionally one from the routed location agnostic ones. I would also like to assign router ports out of the location based blocks which
could host floating ips from the other blocks.</p>
<p dir="ltr">> On 20 July 2015 at 10:33, Carl Baldwin <<a href="mailto:carl@ecbaldwin.net" target="_blank">carl@ecbaldwin.net</a>> wrote:<br>
>><br>
>> When creating a<br>
>> port, the binding information would be sent to the IPAM system and the<br>
>> system would choose an appropriate address block for the allocation.</p>
<p dir="ltr">Implicit in both is a need to provide at least a hint at host binding. Or, delay address assignment until binding. I didn't mention it because my email was already long.<br>
This is something and discussed but applies equally to both proposals.</p>
<p dir="ltr">> No, it wouldn't, because creating and binding a port are separate operations. I can't give the port a location-specific address on creation - not until it's bound, in fact, which happens much later.<br>
><br>
> On proposal 1: consider the cost of adding a datamodel to Neutron. It has to be respected by all developers, it frequently has to be deployed by all operators, and every future change has to align with it. Plus either it has to be generic or optional, and
if optional it's a burden to some proportion of Neutron developers and users. I accept proposal 1 is easy, but it's not universally applicable. It doesn't work with Neil Jerram's plans, it doesn't work with multiple interfaces per host, and it doesn't work
with the IPv6 routed-network model I worked on.</p>
<p dir="ltr">Please be more specific. I'm not following your argument here. My proposal doesn't really add much new data model.</p>
<p dir="ltr">We've discussed this with Neil at length. I haven't been able to reconcile our respective approaches in to one model that works for both of us and still provides value. The routed segments model needs to somehow handle the L2 details
of the underlying network. Neil's model confines L2 to the port and routes to it. The two models can't just be squished together unless I'm missing something.</p>
<p dir="ltr">Could you provide some links so that I can brush up on your ipv6 routed network model? I'd like to consider it but I don't know much about it.</p>
<p dir="ltr">> Given that, I wonder whether proposal 2 could be rephrased.<br>
><br>
> 1: some network types don't allow unbound ports to have addresses, they just get placeholder addresses for each subnet until they're bound<br>
> 2: 'subnets' on these networks are more special than subnets on other networks. (More accurately, they dont use subnets. It's a shame subnets are core Neutron, because they're pretty horrible and yet hard to replace.)<br>
> 3: there's an independent (in an extension? In another API endpoint?) datamodel that the network points to and that IPAM refers to to find a port an address. Bonus, people who aren't using funky network types can disable this extension.<br>
> 4: when the port is bound, the IPAM is referred to, and it's told the binding information of the port.<br>
> 5: when binding the port, once IPAM has returned its address, the network controller probably does stuff with that address when it completes the binding (like initialising routing).<br>
> 6: live migration either has to renumber a port or forward old traffic to the new address via route injection. This is an open question now, so I'm mentioning it rather than solving it.</p>
<p dir="ltr">I left out the migration issue from my email also because it also affects both proposals equally.</p>
<p dir="ltr">> In fact, adding that hook to IPAM at binding plus setting aside a 'not set' IP address might be all you need to do to make it possible. The IPAM needs data to work out what an address is, but that doesn't have to take the form of existing
Neutron constructs.</p>
<p dir="ltr">What about the L2 network for each segment? I suggested creating provider networks for these. Do you have a different suggestion?
</p>
<p dir="ltr">What about distinguishing the bound address blocks from the mobile address blocks? For example, the address blocks bound to the segments could be from a private space. A router port may get an address from this private space and be the
next hop for public addresses. Or, GoDaddy's model where vms get an address from the segment network and optionally a floating ip which is routed.</p>
<p dir="ltr">Carl</p>
</div></div></div>
__________________________________________________________________________<span class=""><br>
OpenStack Development Mailing List (not for usage questions)<br></span>
Unsubscribe: <a href="mailto:OpenStack-dev-request@lists.openstack.org" target="_blank">
OpenStack-dev-request@lists.openstack.org</a>?subject:unsubscribe<br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
<br>__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br></div></div>