[placement][nova][ptg] resource provider affinity

Jay Pipes jaypipes at gmail.com
Sat Apr 27 15:52:29 UTC 2019


On 04/26/2019 08:49 PM, Alex Xu wrote:
> Nadathur, Sundar <sundar.nadathur at intel.com 
>     Anyways, for Cyborg, it seems to me that there is a fairly
>     straightforward scheme to address NUMA affinity: annotate the
>     device’s nested RP with a trait indicating which NUMA node it
>     belongs to (e.g. CUSTOM_NUMA_NODE_0), and use that to guide
>     scheduling. This should be a valid use of traits because it
>     expresses a property of the resource provider and is used for
>     scheduling (only).
> 
> 
> I don't like the way of using trait to mark out the NUMA node.

Me neither. Traits are capabilities, not indicators of the relationship 
between one provider and another.

The structure of hierarchical resource providers is what provides 
topology information -- i.e. about how providers are related to each 
other within a tree organization, and this is what is appropriate for 
encoding NUMA topology information into placement.

The request should never ask for "NUMA Node 0". The reason is because 
the request shouldn't require that the user understand where the 
resources are.

It shouldn't matter *which* NUMA node a particular device that is 
providing some resources is affined to. The only thing that matters to a 
*request* is that the user is able to describe the nature of the affinity.

I propose using a "group_policy=same_tree:$GROUP_A:$GROUP_B" query 
parameter for enabling users to describe the affinity constraints for 
various resources involved in different RequestGroups in the request spec.

group_policy=same_tree:$A:$B would mean "ensure that the providers that 
match the constraints of request group $B are in the same inclusive tree 
that matched for request group $A"

So, let's say you have a flavor that will consume:

  2 dedicated host CPU processors
  4GB RAM
  1 context/handle for an accelerator running a crypto algorithm

Further, you want to ensure that the provider tree that is providing 
those dedicated CPUs and RAM will also provide the accelerator context 
-- in other words, you are requesting a low level of latency between the 
memory and the accelerator device itself.

The above request to GET /a_c would look like this:

  GET /a_c?
    resources1=PCPU:2&
    resources1=MEMORY_MB=4096&
    resources2=ACCELERATOR_CONTEXT&
    required2=CUSTOM_BITSTREAM_CRYPTO_4AC1&
    group_policy=same_tree:1:2

which would mean, in English, "get me an accelerator context from an 
FPGA that has been flashed with the 4AC1 crypto bitstream and is affined 
to the NUMA node that is providing 4G of main memory and 2 dedicated 
host processors".

Best,
-jay



More information about the openstack-discuss mailing list