[openstack-dev] [neutron][L3] L3 routed network segments
Neil Jerram
Neil.Jerram at metaswitch.com
Wed Jun 17 15:19:45 UTC 2015
Hi Carl,
I know you said at the end of your message below to ping you on IRC, but
there are details here that I'm not sure about, and some suggestions
that I'd like to make, and I think it will be clearer to discuss those
in context. I hope that's OK.
On 11/06/15 17:26, Carl Baldwin wrote:
> Neil,
>
> I'm very glad to here of your interest. I have been talking with Kyle
> Mestery about the rfe you mention [1] since the day he filed it. It
> relates to a blueprint that I have been trying to get traction on [2]
> in various forms for a while [*].
[...]
> [*] You're not the only one having trouble getting traction.
> Sometimes it takes a while to realize that we're interested in similar
> things and to find the commonalities and then to get people excited
> about something. It has been an uphill battle for me until recently.
What you wrote for [*] is so true. I previously thought that I was
trying to introduce fundamentally new ideas into Neutron, but in fact it
appears that similar ideas have been being batted around for some time
by various folk, including yourself, who are already more involved in
and experienced with OpenStack than I am. It has been difficult for me
to discover those existing conversations - but I hope I have them all now.
> The rfe talks about attaching VMs directly to the the L3 routed
> network. This will require some coordination between ip address
> assignment and scheduling of the instance to a compatible physical
> server.
Could you describe in more detail the kind of attachment that you have
in mind, and why it requires the IP address coordination that you mention?
By way of a counter-example... For the kind of attachment that my
project Calico provides, there is no restriction on where IP addresses
may be used. The attachment in this case looks like:
+----------------------+ +----------------+
| Host | | VM |
| | | |
------------- routing ----------------------- |
| eth0 tap123 | | eth0 |
| 172.19.8.239 | | 10.65.0.2 |
| | | |
+----------------------+ +----------------+
Even though the 10.65.0.2 address comes from an OpenStack-defined subnet
such as 10.65.0/24, Calico can assign IP addresses from that subnet to
VMs on any hosts, and provide L3 connectivity between them. It does
this by making the host respond to ARP requests on the TAP interfaces,
and hence forces the host to be the first IP hop for all data from the VMs.
So - and assuming that I've correctly understood what you meant - I
don't think it's true that the concept of an L3 routed network
necessarily implies restrictions on IP address allocation; hence I'd
suggest that we treat those as separate concepts in the API.
> My blueprint, on the other hand, tries to maintain IP mobility across
> the network by relying on the BGP speaker work: another BP we've been
> trying to get traction on for a while.
I think it's important here to clearly separate API from implementation.
For the API, I think the concept that you are expressing is that a
VM's IP address should be routable from outside the immediate network,
without involving a floating IP. Is that correct?
If so, there are then multiple possible implementations of that. When
the immediate network is a traditional Neutron L2 network, the
implementation is as per your BP, i.e. for the virtual router to export
that network's IP address by acting as a BGP speaker.
On the other hand, if the immediate network is a Calico-style L3
network, there is already a BGP speaker running on each host, because
that is part of how Calico implements connectivity within the immediate
network (by exporting local TAP interface routes like '10.65.0.2/32 dev
tap123'), and nothing further is needed.
Does that make sense?
(In both cases, of course, there must be BGP peerings to the other
networks to which it is desired to export VM IP addresses. I'm not yet
sure if it makes sense to aim to specify such peerings on the Neutron
API, or if such details should be regarded as individual deployment
configuration.)
> I also limit the connections
> to the L3 routed network to virtual routers for now.
Right - I think you mean here that, for your work, the immediate network
to which a VM is connected is still a traditional Neutron L2 network.
Is that correct?
For my interests - and I think for those of some other commenters on [1]
- I'd certainly like that to be generalized, to allow the immediate
network to be 'L3' rather than 'L2'.
[1] https://bugs.launchpad.net/neutron/+bug/1458890
Again, it would be good to be as clear as possible on the API concept
here, i.e. about what we really mean. Specifically, I'm not sure 'L3'
is the right concept, because an L2 network is also L3-capable. It's
actually, I think, that network ports are not (necessarily) on an L2
broadcast domain. In the Calico case, there are no L2 broadcast domains
anywhere (or else you could say that each TAP-VM interface is in a
broadcast domain on its own). In the case of other commenters at [1],
the desire (I believe) is to specify that some subset of a network's
ports are on an L2 segment, some other subset on a different L2 segment,
and so on.
As a strawman, a possible API representation of this would involve:
- a Network-level attribute indicating that 'the ports on this network
are generally not on an L2 segment'
- a Port-level attribute indicating the L2 segment ID (if any) that that
port is on.
L2 broadcast capability would then be taken to exist between Ports with
the same L2 segment ID.
Moving onto implementation - we'd then have to consider whether and how
Neutron's in-tree implementation components need tweaking so as to
support such networks. I'm familiar already with the DHCP agent,
because we've modified that for Calico so as to provide DHCP service to
unbridged TAP interfaces (as in the abandoned spec at [2]). But
probably there are other components to consider, too.
[2] https://review.openstack.org/#/c/130736/4
> The two have network segments in common. So, as I proceed on the
> implementation of my blueprint [2], I will keep in mind the needs of
> the rfe [1] and build network segments in a way which can be utilized
> by both. However, I will leave the coordination of VM scheduling and
> IP address assignment to someone else. Does this all make sense?
Yes, and thanks. I am happy to help out with any of the work here, and
I hope that my writings above are useful in helping to synthesize our
various objectives.
Please do let me know what you think.
Thanks,
Neil
More information about the OpenStack-dev
mailing list