[placement][nova][ptg] resource provider affinity
tetsuro.nakamura.bc at hco.ntt.co.jp
Fri May 3 14:22:13 UTC 2019
Sorry for the late response,
Here is my thoughts on "resource provider affinity".
“The rps are in a same subtree” is equivalent to “there exits an rp
which is an ancestor of all the other rps”
means “rp2 is a descendent of rp1 (or rp1 is a descendent of rp2.)”
We can extend it to cases we have more than two groups:
means "both rp2 and rp3 are descendents of rp1 (or both rp1 and rp3 are
of rp2 or both rp1 and rp2 are of rp3)
Eric's question from PTG yesterday was whether to keep the symmetry
that is, whether to take the conditions enclosed in the parentheses above.
I would say yes keep the symmetry because
1. the expression 1:2:3 is more of symmetry. If we want to make it
asymmetric, it should express the subtree root more explicitly like
1-2:3 or 1-2:3:4.
2. callers may not be aware of which resource (VCPU or VF) is provided
by the upper/lower rp.
IOW, the caller - resource retriever (scheduler) - doesn't want to
know how the reporter - virt driver - has reported the resouces.
Note that even in the symmetric world the negative expression jay
suggested looks good to me.
It enables something like:
which means 1 and 2 should be in the same group but 3 shoudn't be the
descendents of 1 or 2, so as 4.
However, speaking in the design level, the adjacency list model (so
called naive tree model), which we currently use for nested rps,
is not good at retrieving subtrees (compared to e.g. nested set model).
I have looked into recursive SQL CTE (common table expression) feature
which help us treat subtree easily in adjacency list model in a
experimental patch ,
but unfortunately it looks like the feature is still experimental in
MySQL, and we don't want to query like this per every candidates, do we? :(
Therefore, for this specific use case of NUMA affinity I'd like
alternatively propose bringing a concept of resource group distance in
the rp graph.
* numa affinity case
* anti numa affinity
which can be realized by looking into the cached adjacency rp (i.e.
(supporting group_distance=N (N>1) would be a future research or
implement anyway overlooking the performance)
One drawback of this is that we can't use this if you create multiple
nested layers with more than 1 depth under NUMA rps,
but is that the case for OvS bandwidth?
Another alternative is having a "closure table" from where we can
retrieve all the descendent rp ids of an rp without joining tables.
but... online migration cost?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openstack-discuss