[openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
Jay Pipes
jaypipes at gmail.com
Mon Apr 11 11:46:26 UTC 2016
Hi Miguel Angel, comments/answers inline :)
On 04/08/2016 09:17 AM, Miguel Angel Ajo Pelayo wrote:
> Hi!,
>
> In the context of [1] (generic resource pools / scheduling in nova)
> and [2] (minimum bandwidth guarantees -egress- in neutron), I had a talk
> a few weeks ago with Jay Pipes,
>
> The idea was leveraging the generic resource pools and scheduling
> mechanisms defined in [1] to find the right hosts and track the total
> available bandwidth per host (and per host "physical network"),
> something in neutron (still to be defined where) would notify the new
> API about the total amount of "NIC_BW_KB" available on every host/physnet.
Yes, what we discussed was making it initially per host, meaning the
host would advertise a total aggregate bandwidth amount for all NICs
that it uses for the data plane as a single amount.
The other way to track this resource class (NIC_BW_KB) would be to make
the NICs themselves be resource providers and then the scheduler could
pick a specific NIC to bind the port to based on available NIC_BW_KB on
a particular NIC.
The former method makes things conceptually easier at the expense of
introducing greater potential for retrying placement decisions (since
the specific NIC to bind a port to wouldn't be known until the claim is
made on the compute host). The latter method adds complexity to the
filtering and scheduler in order to make more accurate placement
decisions that would result in fewer retries.
> That part is quite clear to me,
>
> From [1] I'm not sure which blueprint introduces the ability to
> schedule based on the resource allocation/availability itself,
> ("resource-providers-scheduler" seems more like an optimization to the
> schedule/DB interaction, right?)
Yes, you are correct about the above blueprint; it's only for moving the
Python-side filters to be a DB query.
The resource-providers-allocations blueprint:
https://review.openstack.org/300177
Is the one where we convert the various consumed resource amount fields
to live in the single allocations table that may be queried for usage
information.
We aim to use the ComputeNode object as a facade that hides the
migration of these data fields as much as possible so that the scheduler
actually does not need to know that the schema has changed underneath
it. Of course, this only works for *existing* resource classes, like
vCPU, RAM, etc. It won't work for *new* resource classes like the
discussed NET_BW_KB because, clearly, we don't have an existing field in
the instance_extra or other tables that contain that usage amount and
therefore can't use ComputeNode object as a facade over a non-existing
piece of data.
Eventually, the intent is to change the ComputeNode object to return a
new AllocationList object that would contain all of the compute node's
resources in a tabular format (mimicking the underlying allocations table):
https://review.openstack.org/#/c/282442/20/nova/objects/resource_provider.py
Once this is done, the scheduler can be fitted to query this
AllocationList object to make resource usage and placement decisions in
the Python-side filters.
We are still debating on the resource-providers-scheduler-db-filters
blueprint:
https://review.openstack.org/#/c/300178/
Whether to change the existing FilterScheduler or create a brand new
scheduler driver. I could go either way, frankly. If we made a brand new
scheduler driver, it would do a query against the compute_nodes table in
the DB directly. The legacy FilterScheduler would manipulate the
AllocationList object returned by the ComputeNode.allocations attribute.
Either way we get to where we want to go: representing all quantitative
resources in a standardized and consistent fashion.
> And, that brings me to another point: at the moment of filtering
> hosts, nova I guess, will have the neutron port information, it has to
> somehow identify if the port is tied to a minimum bandwidth QoS policy.
Yes, Nova's conductor gathers information about the requested networks
*before* asking the scheduler where to place hosts:
https://github.com/openstack/nova/blob/stable/mitaka/nova/conductor/manager.py#L362
> That would require identifying that the port has a "qos_policy_id"
> attached to it, and then, asking neutron for the specific QoS policy
> [3], then look out for a minimum bandwidth rule (still to be defined),
> and extract the required bandwidth from it.
Yep, exactly correct.
> That moves, again some of the responsibility to examine and
> understand external resources to nova.
Yep, it does. The alternative is more retries for placement decisions
because accurate decisions cannot be made until the compute node is
already selected and the claim happens on the compute node.
> Could it make sense to make that part pluggable via stevedore?, so
> we would provide something that takes the "resource id" (for a port in
> this case) and returns the requirements translated to resource classes
> (NIC_BW_KB in this case).
Not sure Stevedore makes sense in this context. Really, we want *less*
extensibility and *more* consistency. So, I would envision rather a
system where Nova would call to Neutron before scheduling when it has
received a port or network ID in the boot request and ask Neutron
whether the port or network has any resource constraints on it. Neutron
would return a standardized response containing each resource class and
the amount requested in a dictionary (or better yet, an os_vif.objects.*
object, serialized). Something like:
{
'resources': {
'<UUID of port or network>': {
'NIC_BW_KB': 2048,
'IPV4_ADDRESS': 1
}
}
}
In the case of the NIC_BW_KB resource class, Nova's scheduler would look
for compute nodes that had a NIC with that amount of bandwidth still
available. In the case of the IPV4_ADDRESS resource class, Nova's
scheduler would use the generic-resource-pools interface to find a
resource pool of IPV4_ADDRESS resources (i.e. a Neutron routed network
or subnet allocation pool) that has available IP space for the request.
Best,
-jay
> Best regards,
> Miguel Ángel Ajo
>
>
> [1]
> http://lists.openstack.org/pipermail/openstack-dev/2016-February/086371.html
> [2] https://bugs.launchpad.net/neutron/+bug/1560963
> [3] http://developer.openstack.org/api-ref-networking-v2-ext.html#showPolicy
More information about the OpenStack-dev
mailing list