[openstack-dev] [neutron][L3] VM Scheduling v/s Network as input any consideration ?

A, Keshava keshava.a at hp.com
Sat May 31 01:27:42 UTC 2014


Carl,
Ok we will look how it happens if  the VM moves  in the random manner without  considering network as parameter.


1.       Before any VM movement the network would look something like . I am assuming that as current proposals we run BGP in each of the Nodes from underlay side.

From the overlay (switch side) they are free to choose which ever there routing ( BGP/IGP).

Route summarization is happening in each Routing side, so that route prefix are getting advertised as intended, forwarding is intact without any loops.



[cid:image003.png at 01CF7C97.27510200]



2.       When VM got moved in the random manner ( without considering the Network).

Because of the routes getting injected into underlay network in random manner, the route gets scattered across all the nodes.

Concern:

1.       Because of VM move happened under a different TOR ( which is of different subnet/network)  route prefix will start scattered.

In the longer run when many of the VMs moves across network, then whole of  network will have scattered prefix (without any route aggregation effective without any longest prefix match).

When this happens the number of route entries will increase (as there is not aggregation) since routing protocol can aggregate routes .

When any of the VM is down/moved then amount of route flapping happens is more ( because all these routes are non-aggregated). So there will be more route  with draw  and addition happens.



2.       In order to allow a physical server to move  VMs in different subnets, or VMs to be moved to different server racks  without IP address re-configuration, the networks need to able to do multiple broadcast domains (many VLANs) on

boundary routers and TOR  switches and allow some subnets to span  across multiple router ports.

-- How to consider this from cloud perspective ?  Because this  side interface/configuration  are still from cloud perspective ?


3.       During the VM movement to a rack where its subnet is not existing ..

a. Original TOR switch continues receive the packet (as VM was not moved) and gets dropped.
- Once this happens next time onwards TOR will start do  ARP broadcast ( as long it keep receiving that packet)

b. This will continue to happen till network become stable (route-prefix are added to intended Router/switches).

Yes these are bigger problem we need to look into.
Some of the issue IETF is trying to address ( not sure how far to reach).
From cloud side can we  avoid VM movement to random subnet/network as much as possible.  Because ultimately it impact cloud VM since it last mile destination for the traffic.



[cid:image005.png at 01CF7C97.A5423800]





Thanks & regards,
Keshava.A

From: Carl Baldwin [mailto:carl at ecbaldwin.net]
Sent: Friday, May 30, 2014 2:35 AM
To: A, Keshava
Cc: jcsf31459 at gmail.com; Armando M.; OpenStack Development Mailing List (not for usage questions); Kyle Mestery
Subject: Re: [openstack-dev] [neutron][L3] VM Scheduling v/s Network as input any consideration ?

Keshava,

How much of a problem is routing prefix fragmentation for you?  Fragmentation causes routing table bloat and may reduce the performance of the routing table.  It also increases the amount of information traded by the routing protocol.  Which aspect(s) is (are) affecting you?  Can you quantify this effect?

A major motivation for my interest in employing a dynamic routing protocol within a datacenter is to enable IP mobility so that I don't need to worry about doing things like scheduling instances based on their IP addresses.  Also, I believe that it can make floating ips more "floaty" so that they can cross network boundaries without having to statically configure routers.

To get this mobility, it seems inevitable to accept the fragmentation in the routing prefixes.  This level of fragmentation would be contained to a well-defined scope, like within a datacenter.  Is it your opinion that trading off fragmentation for mobility a bad trade-off?  Maybe it depends on the capabilities of the TOR switches and routers that you have.  Maybe others can chime in here.

Carl

On Wed, May 28, 2014 at 10:11 PM, A, Keshava <keshava.a at hp.com<mailto:keshava.a at hp.com>> wrote:
Hi,
Motivation behind this  requirement is “ to achieve VM prefix aggregation  using routing protocol ( BGP/OSPF)”.
So that prefix advertised from cloud to upstream will be aggregated.

I do not have idea how the current scheduler is implemented.
But schedule to  maintain some kind of the ‘Network to Node mapping to VM” ..
Based on that mapping to if any new VM  getting hosted to give prefix in those Nodes based one input preference.

It will be great help us from routing side if this is available in the infrastructure.
I am available for review/technical discussion/meeting.


Thanks & regards,
Keshava.A

From: jcsf31459 at gmail.com<mailto:jcsf31459 at gmail.com> [mailto:jcsf31459 at gmail.com<mailto:jcsf31459 at gmail.com>]
Sent: Thursday, May 29, 2014 9:14 AM
To: openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>; Carl Baldwin; Kyle Mestery; OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron][L3] VM Scheduling v/s Network as input any consideration ?

Hi keshava,

This is an area that I am interested in.   I'd be happy to collaborate with you on a blueprint.    This would require enhancements to the scheduler as you suggested.

There are a number of uses cases for this.


‎John.

Sent from my  smartphone.
From: A, Keshava‎
Sent: Tuesday, May 27, 2014 10:58 AM
To: Carl Baldwin; Kyle Mestery; OpenStack Development Mailing List (not for usage questions)
Reply To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [neutron][L3] VM Scheduling v/s Network as input any consideration ?


Hi,
I have one of the basic question about the Nova Scheduler in the following below scenario.
Whenever a new VM to be hosted is there any consideration of network attributes ?
Example let us say all the VMs with 10.1.x is under TOR-1, and 20.1.xy are under TOR-2.
A new CN nodes is inserted under TOR-2 and at same time a new  tenant VM needs to be  hosted for 10.1.xa network.

Then is it possible to mandate the new VM(10.1.xa)   to hosted under TOR-1 instead of it got scheduled under TOR-2 ( where there CN-23 is completely free from resource perspective ) ?
This is required to achieve prefix/route aggregation and to avoid network broadcast (incase if they are scattered across different TOR/Switch) ?



[cid:image001.png at 01CF7C96.BD0A9640]

Thanks & regards,
Keshava.A



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140531/40ebc9b4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 30014 bytes
Desc: image001.png
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140531/40ebc9b4/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.emz
Type: application/octet-stream
Size: 15634 bytes
Desc: image002.emz
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140531/40ebc9b4/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 21827 bytes
Desc: image003.png
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140531/40ebc9b4/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oledata.mso
Type: application/octet-stream
Size: 35290 bytes
Desc: oledata.mso
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140531/40ebc9b4/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.emz
Type: application/octet-stream
Size: 20533 bytes
Desc: image004.emz
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140531/40ebc9b4/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 32706 bytes
Desc: image005.png
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140531/40ebc9b4/attachment-0005.png>


More information about the OpenStack-dev mailing list