[openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class

Jay Pipes jaypipes at gmail.com
Mon Apr 11 11:46:26 UTC 2016


Hi Miguel Angel, comments/answers inline :)

On 04/08/2016 09:17 AM, Miguel Angel Ajo Pelayo wrote:
> Hi!,
>
>     In the context of [1] (generic resource pools / scheduling in nova)
> and [2] (minimum bandwidth guarantees -egress- in neutron), I had a talk
> a few weeks ago with Jay Pipes,
>
>     The idea was leveraging the generic resource pools and scheduling
> mechanisms defined in [1] to find the right hosts and track the total
> available bandwidth per host (and per host "physical network"),
> something in neutron (still to be defined where) would notify the new
> API about the total amount of "NIC_BW_KB" available on every host/physnet.

Yes, what we discussed was making it initially per host, meaning the 
host would advertise a total aggregate bandwidth amount for all NICs 
that it uses for the data plane as a single amount.

The other way to track this resource class (NIC_BW_KB) would be to make 
the NICs themselves be resource providers and then the scheduler could 
pick a specific NIC to bind the port to based on available NIC_BW_KB on 
a particular NIC.

The former method makes things conceptually easier at the expense of 
introducing greater potential for retrying placement decisions (since 
the specific NIC to bind a port to wouldn't be known until the claim is 
made on the compute host). The latter method adds complexity to the 
filtering and scheduler in order to make more accurate placement 
decisions that would result in fewer retries.

>     That part is quite clear to me,
>
>     From [1] I'm not sure which blueprint introduces the ability to
> schedule based on the resource allocation/availability itself,
> ("resource-providers-scheduler" seems more like an optimization to the
> schedule/DB interaction, right?)

Yes, you are correct about the above blueprint; it's only for moving the 
Python-side filters to be a DB query.

The resource-providers-allocations blueprint:

https://review.openstack.org/300177

Is the one where we convert the various consumed resource amount fields 
to live in the single allocations table that may be queried for usage 
information.

We aim to use the ComputeNode object as a facade that hides the 
migration of these data fields as much as possible so that the scheduler 
actually does not need to know that the schema has changed underneath 
it. Of course, this only works for *existing* resource classes, like 
vCPU, RAM, etc. It won't work for *new* resource classes like the 
discussed NET_BW_KB because, clearly, we don't have an existing field in 
the instance_extra or other tables that contain that usage amount and 
therefore can't use ComputeNode object as a facade over a non-existing 
piece of data.

Eventually, the intent is to change the ComputeNode object to return a 
new AllocationList object that would contain all of the compute node's 
resources in a tabular format (mimicking the underlying allocations table):

https://review.openstack.org/#/c/282442/20/nova/objects/resource_provider.py

Once this is done, the scheduler can be fitted to query this 
AllocationList object to make resource usage and placement decisions in 
the Python-side filters.

We are still debating on the resource-providers-scheduler-db-filters 
blueprint:

https://review.openstack.org/#/c/300178/

Whether to change the existing FilterScheduler or create a brand new 
scheduler driver. I could go either way, frankly. If we made a brand new 
scheduler driver, it would do a query against the compute_nodes table in 
the DB directly. The legacy FilterScheduler would manipulate the 
AllocationList object returned by the ComputeNode.allocations attribute. 
Either way we get to where we want to go: representing all quantitative 
resources in a standardized and consistent fashion.

>      And, that brings me to another point: at the moment of filtering
> hosts, nova  I guess, will have the neutron port information, it has to
> somehow identify if the port is tied to a minimum bandwidth QoS policy.

Yes, Nova's conductor gathers information about the requested networks 
*before* asking the scheduler where to place hosts:

https://github.com/openstack/nova/blob/stable/mitaka/nova/conductor/manager.py#L362

>      That would require identifying that the port has a "qos_policy_id"
> attached to it, and then, asking neutron for the specific QoS policy
>   [3], then look out for a minimum bandwidth rule (still to be defined),
> and extract the required bandwidth from it.

Yep, exactly correct.

>     That moves, again some of the responsibility to examine and
> understand external resources to nova.

Yep, it does. The alternative is more retries for placement decisions 
because accurate decisions cannot be made until the compute node is 
already selected and the claim happens on the compute node.

>      Could it make sense to make that part pluggable via stevedore?, so
> we would provide something that takes the "resource id" (for a port in
> this case) and returns the requirements translated to resource classes
> (NIC_BW_KB in this case).

Not sure Stevedore makes sense in this context. Really, we want *less* 
extensibility and *more* consistency. So, I would envision rather a 
system where Nova would call to Neutron before scheduling when it has 
received a port or network ID in the boot request and ask Neutron 
whether the port or network has any resource constraints on it. Neutron 
would return a standardized response containing each resource class and 
the amount requested in a dictionary (or better yet, an os_vif.objects.* 
object, serialized). Something like:

{
   'resources': {
     '<UUID of port or network>': {
       'NIC_BW_KB': 2048,
       'IPV4_ADDRESS': 1
     }
   }
}

In the case of the NIC_BW_KB resource class, Nova's scheduler would look 
for compute nodes that had a NIC with that amount of bandwidth still 
available. In the case of the IPV4_ADDRESS resource class, Nova's 
scheduler would use the generic-resource-pools interface to find a 
resource pool of IPV4_ADDRESS resources (i.e. a Neutron routed network 
or subnet allocation pool) that has available IP space for the request.

Best,
-jay

> Best regards,
> Miguel Ángel Ajo
>
>
> [1]
> http://lists.openstack.org/pipermail/openstack-dev/2016-February/086371.html
> [2] https://bugs.launchpad.net/neutron/+bug/1560963
> [3] http://developer.openstack.org/api-ref-networking-v2-ext.html#showPolicy



More information about the OpenStack-dev mailing list