Re: [placement][nova][ptg] resource provider affinity

26 Apr 2019


      Nadathur, Sundar <sundar.nadathur@intel.com> 于2019年4月25日周四 下午5:59写道：
...
*From:* Alex Xu <soulxu@gmail.com>
*Sent:* Monday, April 15, 2019 4:50 PM
...
The cyborg only needs to return un-numbered request group, then Nova will
base on all the 'hw:xxx' extra specs and 'accel:device_profile.[numa node
id]' to generate a placement request like above.
I am not quite following the idea(s) proposed here. Cyborg returns only
the device-related request groups. The un-numbered request group in the
flavor is not touched by Cyborg.
Secondly, if you use the ‘accel:’ stuff in the flavor to decide NUMA
affinity, how will you pass that to Placement? This thread is about the
syntax of the GET /a-c call.
The point at here is about we need some way to enable cyborg tell nova that
the device will attach to which guest numa node. I don't think we should
code the device guest numa affinity info in the request group which Cyborg
returned. The nova's flavor is the one tell that, and pass to the affinity
requirement to the GET /a-c call.
...
...
For example, if it is PCI device under first numa node, the extra spec
will
be 'accel:device_profile.0=<profile_name>' the cyborg can return a simple
request 'resources=CYBORG_PCI_XX_DEVICE:1', then we merge this into the
request group 'resources1=VCPU:2,MEMORY_MB:128,CYBORG_PCI_XX_DEVICE:1'.
If
the pci device has a special trait, then cyborg should return request
group
as 'resources1=CYBORG_PCI_XX_DEVICE:1&required=SOME_TRAIT', then nova
merge
this into placement request as 'resources1.1'.
Sorry, I don’t follow this either. The request groups have entries like
‘resources:CUSTOM_FOO=1’, not 'resources=CYBORG_PCI_XX_DEVICE:1'. So, I
don’t see where to stick the NUMA node #.
Anyways, for Cyborg, it seems to me that there is a fairly straightforward
scheme to address NUMA affinity: annotate the device’s nested RP with a
trait indicating which NUMA node it belongs to (e.g. CUSTOM_NUMA_NODE_0),
and use that to guide scheduling. This should be a valid use of traits
because it expresses a property of the resource provider and is used for
scheduling (only).
I don't like the way of using trait to mark out the NUMA node.
...
As for how the annotation is done, it could be automated. The operator’s
tool that configures a device to affinitize with a NUMA node (by setting
MSI-X vectors, etc.) also invokes a Cyborg API (yet to be written) with the
NUMA node # -- that would identify the device RP and update Placement with
that trait. The tool needs to ensure that the device has been discovered by
Cyborg and updated in Placement before invoking the API.
What I'm talking about at here is about the virtual device attach to which
guest numa node. It isn't about the a physical device affinity to which
host numa node.
...
Regards,
Sundar