Nadathur, Sundar <sundar.nadathur@intel.com> 于2019年4月25日周四 下午5:59写道:
*From:* Alex Xu <soulxu@gmail.com> *Sent:* Monday, April 15, 2019 4:50 PM
The cyborg only needs to return un-numbered request group, then Nova will base on all the 'hw:xxx' extra specs and 'accel:device_profile.[numa node id]' to generate a placement request like above.
I am not quite following the idea(s) proposed here. Cyborg returns only the device-related request groups. The un-numbered request group in the flavor is not touched by Cyborg.
Secondly, if you use the ‘accel:’ stuff in the flavor to decide NUMA affinity, how will you pass that to Placement? This thread is about the syntax of the GET /a-c call.
The point at here is about we need some way to enable cyborg tell nova that the device will attach to which guest numa node. I don't think we should code the device guest numa affinity info in the request group which Cyborg returned. The nova's flavor is the one tell that, and pass to the affinity requirement to the GET /a-c call.
For example, if it is PCI device under first numa node, the extra spec will be 'accel:device_profile.0=<profile_name>' the cyborg can return a simple request 'resources=CYBORG_PCI_XX_DEVICE:1', then we merge this into the request group 'resources1=VCPU:2,MEMORY_MB:128,CYBORG_PCI_XX_DEVICE:1'. If the pci device has a special trait, then cyborg should return request group as 'resources1=CYBORG_PCI_XX_DEVICE:1&required=SOME_TRAIT', then nova merge this into placement request as 'resources1.1'.
Sorry, I don’t follow this either. The request groups have entries like ‘resources:CUSTOM_FOO=1’, not 'resources=CYBORG_PCI_XX_DEVICE:1'. So, I don’t see where to stick the NUMA node #.
Anyways, for Cyborg, it seems to me that there is a fairly straightforward scheme to address NUMA affinity: annotate the device’s nested RP with a trait indicating which NUMA node it belongs to (e.g. CUSTOM_NUMA_NODE_0), and use that to guide scheduling. This should be a valid use of traits because it expresses a property of the resource provider and is used for scheduling (only).
I don't like the way of using trait to mark out the NUMA node.
As for how the annotation is done, it could be automated. The operator’s tool that configures a device to affinitize with a NUMA node (by setting MSI-X vectors, etc.) also invokes a Cyborg API (yet to be written) with the NUMA node # -- that would identify the device RP and update Placement with that trait. The tool needs to ensure that the device has been discovered by Cyborg and updated in Placement before invoking the API.
What I'm talking about at here is about the virtual device attach to which guest numa node. It isn't about the a physical device affinity to which host numa node.
Regards,
Sundar