[openstack-dev] [neutron][L3] L3 routed network segments

Neil Jerram Neil.Jerram at metaswitch.com
Wed Jun 17 15:19:45 UTC 2015


Hi Carl,

I know you said at the end of your message below to ping you on IRC, but 
there are details here that I'm not sure about, and some suggestions 
that I'd like to make, and I think it will be clearer to discuss those 
in context.  I hope that's OK.

On 11/06/15 17:26, Carl Baldwin wrote:
> Neil,
>
> I'm very glad to here of your interest.  I have been talking with Kyle
> Mestery about the rfe you mention [1] since the day he filed it.  It
> relates to a blueprint that I have been trying to get traction on [2]
> in various forms for a while [*].
[...]
 > [*] You're not the only one having trouble getting traction.
 > Sometimes it takes a while to realize that we're interested in similar
 > things and to find the commonalities and then to get people excited
 > about something.  It has been an uphill battle for me until recently.

What you wrote for [*] is so true.  I previously thought that I was 
trying to introduce fundamentally new ideas into Neutron, but in fact it 
appears that similar ideas have been being batted around for some time 
by various folk, including yourself, who are already more involved in 
and experienced with OpenStack than I am.  It has been difficult for me 
to discover those existing conversations - but I hope I have them all now.

> The rfe talks about attaching VMs directly to the the L3 routed
> network.  This will require some coordination between ip address
> assignment and scheduling of the instance to a compatible physical
> server.

Could you describe in more detail the kind of attachment that you have 
in mind, and why it requires the IP address coordination that you mention?

By way of a counter-example...  For the kind of attachment that my 
project Calico provides, there is no restriction on where IP addresses 
may be used.  The attachment in this case looks like:

           +----------------------+          +----------------+
           |        Host          |          |        VM      |
           |                      |          |                |
    -------------  routing  -----------------------           |
           | eth0          tap123 |          | eth0           |
           | 172.19.8.239         |          | 10.65.0.2      |
           |                      |          |                |
           +----------------------+          +----------------+

Even though the 10.65.0.2 address comes from an OpenStack-defined subnet 
such as 10.65.0/24, Calico can assign IP addresses from that subnet to 
VMs on any hosts, and provide L3 connectivity between them.  It does 
this by making the host respond to ARP requests on the TAP interfaces, 
and hence forces the host to be the first IP hop for all data from the VMs.

So - and assuming that I've correctly understood what you meant - I 
don't think it's true that the concept of an L3 routed network 
necessarily implies restrictions on IP address allocation; hence I'd 
suggest that we treat those as separate concepts in the API.

> My blueprint, on the other hand, tries to maintain IP mobility across
> the network by relying on the BGP speaker work:  another BP we've been
> trying to get traction on for a while.

I think it's important here to clearly separate API from implementation. 
  For the API, I think the concept that you are expressing is that a 
VM's IP address should be routable from outside the immediate network, 
without involving a floating IP.  Is that correct?

If so, there are then multiple possible implementations of that.  When 
the immediate network is a traditional Neutron L2 network, the 
implementation is as per your BP, i.e. for the virtual router to export 
that network's IP address by acting as a BGP speaker.

On the other hand, if the immediate network is a Calico-style L3 
network, there is already a BGP speaker running on each host, because 
that is part of how Calico implements connectivity within the immediate 
network (by exporting local TAP interface routes like '10.65.0.2/32 dev 
tap123'), and nothing further is needed.

Does that make sense?

(In both cases, of course, there must be BGP peerings to the other 
networks to which it is desired to export VM IP addresses.  I'm not yet 
sure if it makes sense to aim to specify such peerings on the Neutron 
API, or if such details should be regarded as individual deployment 
configuration.)

>  I also limit the connections
> to the L3 routed network to virtual routers for now.

Right - I think you mean here that, for your work, the immediate network 
to which a VM is connected is still a traditional Neutron L2 network. 
Is that correct?

For my interests - and I think for those of some other commenters on [1] 
- I'd certainly like that to be generalized, to allow the immediate 
network to be 'L3' rather than 'L2'.

[1] https://bugs.launchpad.net/neutron/+bug/1458890

Again, it would be good to be as clear as possible on the API concept 
here, i.e. about what we really mean.  Specifically, I'm not sure 'L3' 
is the right concept, because an L2 network is also L3-capable.  It's 
actually, I think, that network ports are not (necessarily) on an L2 
broadcast domain.  In the Calico case, there are no L2 broadcast domains 
anywhere (or else you could say that each TAP-VM interface is in a 
broadcast domain on its own).  In the case of other commenters at [1], 
the desire (I believe) is to specify that some subset of a network's 
ports are on an L2 segment, some other subset on a different L2 segment, 
and so on.

As a strawman, a possible API representation of this would involve:

- a Network-level attribute indicating that 'the ports on this network 
are generally not on an L2 segment'

- a Port-level attribute indicating the L2 segment ID (if any) that that 
port is on.

L2 broadcast capability would then be taken to exist between Ports with 
the same L2 segment ID.

Moving onto implementation - we'd then have to consider whether and how 
Neutron's in-tree implementation components need tweaking so as to 
support such networks.  I'm familiar already with the DHCP agent, 
because we've modified that for Calico so as to provide DHCP service to 
unbridged TAP interfaces (as in the abandoned spec at [2]).  But 
probably there are other components to consider, too.

[2] https://review.openstack.org/#/c/130736/4

> The two have network segments in common.  So, as I proceed on the
> implementation of my blueprint [2], I will keep in mind the needs of
> the rfe [1] and build network segments in a way which can be utilized
> by both.  However, I will leave the coordination of VM scheduling and
> IP address assignment to someone else.  Does this all make sense?

Yes, and thanks.  I am happy to help out with any of the work here, and 
I hope that my writings above are useful in helping to synthesize our 
various objectives.

Please do let me know what you think.

Thanks,
	Neil



More information about the OpenStack-dev mailing list