[placement][nova][ptg] resource provider affinity

Tetsuro Nakamura tetsuro.nakamura.bc at hco.ntt.co.jp
Fri May 3 14:22:13 UTC 2019


Sorry for the late response,

Here is my thoughts on "resource provider affinity".

“The rps are in a same subtree” is equivalent to “there exits an rp 
which is an ancestor of all the other rps”

Therefore,
* group_resources=1:2
means “rp2 is a descendent of rp1 (or rp1 is a descendent of rp2.)”

We can extend it to cases we have more than two groups:
* group_resources=1:2:3
means "both rp2 and rp3 are descendents of rp1 (or both rp1 and rp3 are 
of rp2 or both rp1 and rp2 are of rp3)

Eric's question from PTG yesterday was whether to keep the symmetry 
between rps,
that is, whether to take the conditions enclosed in the parentheses above.

I would say yes keep the symmetry because

1. the expression 1:2:3 is more of symmetry. If we want to make it 
asymmetric, it should express the subtree root more explicitly like 
1-2:3 or 1-2:3:4.
2. callers may not be aware of which resource (VCPU or VF) is provided 
by the upper/lower rp.
     IOW, the caller - resource retriever (scheduler) -  doesn't want to 
know how the reporter - virt driver - has reported the resouces.

Note that even in the symmetric world the negative expression jay 
suggested looks good to me.
It enables something like:
* group_resources=1:2:!3:!4
which means 1 and 2 should be in the same group but 3 shoudn't be the 
descendents of 1 or 2, so as 4.

However, speaking in the design level, the adjacency list model (so 
called naive tree model), which we currently use for nested rps,
is not good at retrieving subtrees (compared to e.g. nested set model[1]).
[1] https://en.wikipedia.org/wiki/Nested_set_model

I have looked into recursive SQL CTE (common table expression) feature 
which help us treat subtree easily in adjacency list model in a 
experimental patch [2],
but unfortunately it looks like the feature is still experimental in 
MySQL, and we don't want to query like this per every candidates, do we? :(

[2] https://review.opendev.org/#/c/636092/

Therefore, for this specific use case of NUMA affinity I'd like 
alternatively propose bringing a concept of resource group distance in 
the rp graph.

* numa affinity case
   - group_distance(1:2)=1
* anti numa affinity
   - group_distance(1:2)>1

which can be realized by looking into the cached adjacency rp (i.e. 
parent id)
(supporting group_distance=N (N>1) would be a future research or 
implement anyway overlooking the performance)

One drawback of this is that we can't use this if you create multiple 
nested layers with more than 1 depth under NUMA rps,
but is that the case for OvS bandwidth?

Another alternative is having a "closure table" from where we can 
retrieve all the descendent rp ids of an rp without joining tables.
but... online migration cost?

- tetsuro

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190503/c3b3c35b/attachment.html>


More information about the openstack-discuss mailing list