Open Stack

Wed Jun 24 16:02:32 UTC 2015

Thanks Kris and Sam for your replies!

On 18/06/15 01:20, Kris G. Lindgren wrote:
> On 6/17/15, 10:59 AM, "Neil Jerram" <Neil.Jerram at metaswitch.com> wrote:
>
>>
>> On 17/06/15 16:17, Kris G. Lindgren wrote:
>>> See inline.
>>> ____________________________________________
>>>
>>> Kris Lindgren
>>> Senior Linux Systems Engineer
>>> GoDaddy, LLC.
>>>
>>>
>>>
>>> On 6/17/15, 5:12 AM, "Neil Jerram" <Neil.Jerram at metaswitch.com> wrote:
>>>
>>>> Hi Kris,
>>>>
>>>> Apologies in advance for questions that are probably really dumb - but
>>>> there are several points here that I don't understand.
>>>>
>>>> On 17/06/15 03:44, Kris G. Lindgren wrote:
>>>>> We are doing pretty much the same thing - but in a slightly different
>>>>> way.
>>>>>     We extended the nova scheduler to help choose networks (IE. don't
>>>>> put
>>>>> vm's on a network/host that doesn't have any available IP address).
>>>> Why would a particular network/host not have any available IP address?
>>>    If a created network has 1024 ip's on it (/22) and we provision 1020
>>> vms,
>>>    anything deployed after that will not have an additional ip address
>>> because
>>>    the network doesn't have any available ip addresses (loose some ip's
>>> to
>>>    the network).
>> OK, thanks, that certainly explains the "particular network" possibility.
>>
>> So I guess this applies where your preference would be for network A,
>> but it would be OK to fall back to network B, and so on.  That sounds
>> like it could be a useful general enhancement.
>>
>> (But, if a new VM absolutely _has_ to be on, say, the 'production'
>> network, and the 'production' network is already fully used, you're
>> fundamentally stuck, aren't you?)
> Yes - this would be a scheduling failure - and I am ok with that.  It does
> no good to have a vm on a network that doesn't work.
>
>> What about the "/host" part?  Is it possible in your system for a
>> network to have IP addresses available, but for them not to be usable on
>> a particular host?
> Yes this is also a possibility.  That the network allocated to a set of
> hosts has IP's available but no compute capacity to spin up vms on it.
> Again - I am ok with this.
>
>>>>> Then,
>>>>> we add into the host-aggregate that each HV is attached to a network
>>>>> metadata item which maps to the names of the neutron networks that
>>>>> host
>>>>> supports.  This basically creates the mapping of which host supports
>>>>> what
>>>>> networks, so we can correctly filter hosts out during scheduling. We
>>>>> do
>>>>> allow people to choose a network if they wish and we do have the
>>>>> neutron
>>>>> end-point exposed. However, by default if they do not supply a boot
>>>>> command with a network, we will filter the networks down and choose
>>>>> one
>>>>> for them.  That way they never hit [1].  This also works well for us,
>>>>> because the default UI that we provide our end-users is not horizon.
>>>> Why do you define multiple networks - as opposed to just one - and why
>>>> would one of your users want to choose a particular one of those?
>>>>
>>>> (Do you mean multiple as in public-1, public-2, ...; or multiple as in
>>>> public, service, ...?)
>>>    This is answered in the other email and original email as well.  But
>>> basically
>>>    we have multiple L2 segments that only exists on certain switches and
>>> thus are
>>>    only tied to certain hosts.  With the way neutron is currently
>>> structured
>>> we
>>>    need to create a network for each L2. So that¹s why we define multiple
>>> networks.
>> Thanks!  Ok, just to check that I really understand this:
>>
>> - You have real L2 segments connecting some of your compute hosts
>> together - and also I guess to a ToR that does L3 to the rest of the
>> data center.
> Correct.
>
>
>> - You presumably then just bridge all the TAP interfaces, on each host,
>> to the host's outwards-facing interface.
>>
>>                         +---- VM
>>                         |
>>         +----- Host ----+---- VM
>>         |               |
>>         |               +---- VM
>>         |
>>         |               +---- VM
>>         |               |
>>         +----- Host ----+---- VM
>>         |               |
>> ToR ---+               +---- VM
>>         |
>>         |               +---- VM
>>         |               |
>>         |----- Host ----+---- VM
>>                         |
>>                         +---- VM
> Also correct, we are using flat "provider" networks (shared=true) -
> however provider vlan networks would work as well.
>
>> - You specify each such setup as a network in the Neutron API - and
>> hence you have multiple similar networks, for your data center as a whole.
>>
>> Out of interest, do you do this just because it's the Right Thing
>> according to the current Neutron API - i.e. because a Neutron network is
>> L2 - or also because it's needed in order to get the Neutron
>> implementation components that you use to work correctly?  For example,
>> so that you have a DHCP agent for each L2 network (if you use the
>> Neutron DHCP agent).
> Somewhat both.  It was a how do I get neutron to handle this without
> making drastic changes to the base level neutron concepts.  We currently
> do have dhcp-agents and nova-metadata agent running in each L2 and we
> specifically assign them to hosts in that L2 space.
Thanks.
>    We are currently
> working on ways to remove this requirement.
That sounds intriguing.  What do you have in mind?
>
>>>    For our end users - they only care about getting a vm with a single ip
>>> address
>>>    in a "network" which is really a zone like "prod" or "dev" or "test".
>>> They stop
>>>    caring after that point.  So in the scheduler filter that we created
>>> we
>>> do
>>>    exactly that.  We will filter down from all the hosts and networks
>>> down
>>> to a
>>>    combo that intersects at a host that has space, with a network that
>>> has
>>> space,
>>>    And the network that was chosen is actually available to that host.
>> Thanks, makes perfect sense now.
>>
>> So I think there are two possible representations, overall, of what you
>> are looking for.
>>
>> 1. A 'network group' of similar L2 networks.  When a VM is launched,
>> tenant specifies the network group instead of a particular L2 network,
>> and Nova/Neutron select a host and network with available compute power
>> and IP addressing.  This sounds like what you've described above.
> Correct - except for us we are currently handling the 'network group'
> using availability zones.  I would also like to hide from non-superusers
> the underling network architecture.  Though I would love to handle this in
> native neutron.
(The whole point of this discussion is to achieve something in native 
Neutron, isn't it?)
>    Where most end users are only presented with the "network
> group" that they can choose.
Interesting, makes sense.

So I think a key question is whether the underlying networks need to be 
described on the Neutron API; and if so, whether IP ranges are 
associated with the network group, or with the underlying networks, or 
possibly with both.

I'm wondering if the new pluggable IPAM stuff might make a difference here.

With pluggable IPAM, it might be possible to:

- describe only the network group on the Neutron API, with IP ranges 
covering the addressing that you will want for all the underlying L2 
segments

- implement a pluggable IPAM module that, once the host for a VM has 
been selected, will allocate an IP address that is suitable for the 
underlying network that that host is attached to.

Essentially this means that knowledge about the real physical network is 
known in some out-of-band way by the IPAM implementation, instead of 
being described explicitly on the Neutron API.  I'm not sure if that's a 
good or bad thing.

By the way, this approach is interesting for Calico-style networking too 
- i.e. where hosts do L3 routing, and there are no L2 segments shared by 
more than one VM.  Because we'd still like to allocate IP addresses in a 
way that will allow us to use aggregate prefixes on the ToRs, so as to 
avoid the problem with numbers of /32 routes that you mention below.

>
> I should add that we only need this for the fixed_ip of the vm.  We
> modified floating_ips in neutron to do route injections into the network
> and we route the floating_ip's to the fixed ip.
> This gives us the IP mobility through the entire L3 network and possibly
> into other L3 networks if needed.  We also modified neutron to allow us to
> route more than one floating to a vm.
> This also allows us to bypass doing nat for floating_ips - since the ip's
> goes straight to the vm and are bound locally inside the vm.
>
>> 2. A new kind of network whose ports are partitioned into various L2
>> segments.  This is like what I've described at [1].
>>
>> [1]
>> http://lists.openstack.org/pipermail/openstack-dev/2015-June/067274.html
>>
>> I would prefer (2) over (1), because I'm interested in a fully routed
>> form of connectivity, and if that was expressed in model (1) it would
>> need a network definition for every VM.
>>
>> Also, with (1) I guess individual IP ranges (or subnet pools?) would
>> need defining for each network, whereas with (2) there would naturally
>> be a single IP range or subnet pool definition for the whole network.
>>
>> Although you have currently modified the Nova scheduler for an approach
>> like (1), do you think (2) would work in principle for you as well?
>
> We (I should say our network architects) talked about having the VM's IP's
> be routed all the way down to the hypervisor.  Where the only IP's that
> would be configured in the L2 domain is the IP's of the hypervisors.  Any
> VM that gets deployed on a HV would be routed in the network to the HV IP.
>   We settled on this hybrid approach for now.  I don’t have the exact
> specifics - but I believe the issue was going to be too many /32 routes
> injected into the network.  Where the current way the fixed_ip of the vm's
> are all handled by a single large route.
Thanks; we're aware of that issue, but we think we can address it 
sufficiently with an intelligent IP address allocation approach, as 
outlined above.

Even if we turn out to be wrong about that, there can still be mid-scale 
data centers that use such routed connectivity without running near the 
limit of /32 routes on their ToRs, and it would be nice if the Neutron 
API could accommodate that.  The key thing, for such a routed networking 
scenario, is that the Neutron API should allow configuration of a single 
network (or network group) object, with associated IP ranges, and flags 
or properties describing the routedness as opposed to the L2ness; and 
should not _require_ configuration of a separate object for every L2 
segment - which for routed networking means for every VM.

On 18/06/15 00:25, Sam Morrison wrote:
> I’m not so sure 2 would work for us, we want to have multiple 
> different networks under a single neutron entity that a user selects 
> at boot time. The instance would then be put onto a network that has 
> it’s own range, gateway, dhcp server, etc. another instance could 
> select the same entity at boot time but then be put onto a different 
> network with a different range, gateway, dhcp etc. There would be no 
> L2 between the 2 instances but there would be L3 which would be 
> provided outside of neutron on the external gateway.
Thanks.  I take your point, but I wonder if pluggable IPAM might make a 
significant difference here, as I've described above in answer to one of 
Kris's comments.

Consider the reasons why it's important to know the true segmentation of 
the physical network.

- IP allocation conforming to actual L2 segments; or else to racks/pods 
such as to facilitate route aggregation on the ToRs or L3 fabric 
routers.  I think this might be achievable by a pluggable IPAM module 
that has the true segmentation information.

- DHCP service.  This can be achieved, I think, by the deployment 
running a DHCP agent in each L2 segment (or, in the routed case, on each 
compute host), and getting the Neutron server to instruct all of those 
agents.  (If the segmentation information was also in the Neutron API, 
perhaps it would be possible for the Neutron server to calculate where 
all the DHCP agents need to be, and to instruct them specifically - but 
that sounds like a fiendishly tricky algorithm to me!)

Are there other reasons?
> We can’t describe the network infra in neutron then change the 
> physical infra to fit. We need neutron to be able to model the 
> physical network we already have.
We certainly need Neutron - meaning some combination of API, reference 
components and vendor components - to operate sensibly on a given real 
physical network.  I'm not sure that means that every detail of the 
physical network has to be described in the API...
> (sorry I’m not a network guy so I hope I have used the right lingo)

> Sam 

Regards,
     Neil

Open Stack

[openstack-dev] [Openstack-operators] [nova] [neutron] Re: How do your end users use networking?

OpenStack

Community

Documentation

Branding & Legal