Sorry for the late response,
Here is my thoughts on "resource provider affinity".
“The rps are in a same subtree” is equivalent to “there exits an rp
which is an ancestor of all the other rps”
Therefore,
* group_resources=1:2
means “rp2 is a descendent of rp1 (or rp1 is a descendent of rp2.)”
We can extend it to cases we have more than two groups:
* group_resources=1:2:3
means "both rp2 and rp3 are descendents of rp1 (or both rp1 and rp3
are of rp2 or both rp1 and rp2 are of rp3)
Eric's question from PTG yesterday was whether to keep the symmetry
between rps,
that is, whether to take the conditions enclosed in the parentheses
above.
I would say yes keep the symmetry because
1. the expression 1:2:3 is more of symmetry. If we want to make it
asymmetric, it should express the subtree root more explicitly like
1-2:3 or 1-2:3:4.
2. callers may not be aware of which resource (VCPU or VF) is
provided by the upper/lower rp.
IOW, the caller - resource retriever (scheduler) - doesn't want
to know how the reporter - virt driver - has reported the resouces.
Note that even in the symmetric world the negative expression jay
suggested looks good to me.
It enables something like:
* group_resources=1:2:!3:!4
which means 1 and 2 should be in the same group but 3 shoudn't be
the descendents of 1 or 2, so as 4.
However, speaking in the design level, the adjacency list model (so
called naive tree model), which we currently use for nested rps,
is not good at retrieving subtrees (compared to e.g. nested set
model[1]).
[1] https://en.wikipedia.org/wiki/Nested_set_model
I have looked into recursive SQL CTE (common table expression)
feature which help us treat subtree easily in adjacency list model
in a experimental patch [2],
but unfortunately it looks like the feature is still experimental in
MySQL, and we don't want to query like this per every candidates, do
we? :(
[2] https://review.opendev.org/#/c/636092/
Therefore, for this specific use case of NUMA affinity I'd like
alternatively propose bringing a concept of resource group distance
in the rp graph.
* numa affinity case
- group_distance(1:2)=1
* anti numa affinity
- group_distance(1:2)>1
which can be realized by looking into the cached adjacency rp (i.e.
parent id)
(supporting group_distance=N (N>1) would be a future research or
implement anyway overlooking the performance)
One drawback of this is that we can't use this if you create
multiple nested layers with more than 1 depth under NUMA rps,
but is that the case for OvS bandwidth?
Another alternative is having a "closure table" from where we can
retrieve all the descendent rp ids of an rp without joining tables.
but... online migration cost?
- tetsuro