On 04/26/2019 08:49 PM, Alex Xu wrote:
Nadathur, Sundar <sundar.nadathur@intel.com Anyways, for Cyborg, it seems to me that there is a fairly straightforward scheme to address NUMA affinity: annotate the device’s nested RP with a trait indicating which NUMA node it belongs to (e.g. CUSTOM_NUMA_NODE_0), and use that to guide scheduling. This should be a valid use of traits because it expresses a property of the resource provider and is used for scheduling (only).
I don't like the way of using trait to mark out the NUMA node.
Me neither. Traits are capabilities, not indicators of the relationship between one provider and another. The structure of hierarchical resource providers is what provides topology information -- i.e. about how providers are related to each other within a tree organization, and this is what is appropriate for encoding NUMA topology information into placement. The request should never ask for "NUMA Node 0". The reason is because the request shouldn't require that the user understand where the resources are. It shouldn't matter *which* NUMA node a particular device that is providing some resources is affined to. The only thing that matters to a *request* is that the user is able to describe the nature of the affinity. I propose using a "group_policy=same_tree:$GROUP_A:$GROUP_B" query parameter for enabling users to describe the affinity constraints for various resources involved in different RequestGroups in the request spec. group_policy=same_tree:$A:$B would mean "ensure that the providers that match the constraints of request group $B are in the same inclusive tree that matched for request group $A" So, let's say you have a flavor that will consume: 2 dedicated host CPU processors 4GB RAM 1 context/handle for an accelerator running a crypto algorithm Further, you want to ensure that the provider tree that is providing those dedicated CPUs and RAM will also provide the accelerator context -- in other words, you are requesting a low level of latency between the memory and the accelerator device itself. The above request to GET /a_c would look like this: GET /a_c? resources1=PCPU:2& resources1=MEMORY_MB=4096& resources2=ACCELERATOR_CONTEXT& required2=CUSTOM_BITSTREAM_CRYPTO_4AC1& group_policy=same_tree:1:2 which would mean, in English, "get me an accelerator context from an FPGA that has been flashed with the 4AC1 crypto bitstream and is affined to the NUMA node that is providing 4G of main memory and 2 dedicated host processors". Best, -jay