[openstack-dev] [Cyborg] [Nova] Cyborg traits

Nadathur, Sundar sundar.nadathur at intel.com
Tue May 29 23:33:59 UTC 2018


Hi all,
    The Cyborg/Nova scheduling spec [1] details what traits will be 
applied to the resource providers that represent devices like GPUs. Some 
of the traits referred to vendor names. I got feedback that traits must 
not refer to products or specific models of devices. I agree. However, 
we need some reference to device types to enable matching the VM driver 
with the device.

TL;DR We need some reference to device types, but we don't need product 
names. I will update the spec [1] to clarify that. Rest of this email 
clarifies why we need device types in traits, and what traits we propose 
to include.

In general, an accelerator device is operated by two pieces of software: 
a driver in the kernel (which may discover and handle the PF for SR-IOV  
devices), and a driver/library in the guest (which may handle the 
assigned VF).

The device assigned to the VM must match the driver/library packaged in 
the VM. For this, the request must explicitly state what category of 
devices it needs. For example, if the VM needs a GPU, it needs to say 
whether it needs an AMD GPU or an Nvidia GPU, since it may have the 
driver/libraries for that vendor alone. It may also need to state what 
version of Cuda is needed, if it is a Nvidia GPU. These aspects are 
necessarily vendor-specific.

Further, one driver/library version may handle multiple devices. Since a 
new driver version may be backwards compatible, multiple driver versions 
may manage the same device. The development/release of the 
driver/library inside the VM should be independent of the kernel driver 
for that device.

For FPGAs, there is an additional twist as the VM may need specific 
bitstream(s), and they match only specific device/region types. The 
bitstream for a device from a vendor will not fit any other device from 
the same vendor, let alone other vendors. IOW, the region type is 
specific not just to a vendor but to a device type within the vendor. 
So, it is essential to identify the device type.

So, the proposed set of RCs and traits are as below. As we learn more 
about actual usages by operators, we may need to evolve this set.

  * There is a resource class per device category e.g.
    CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.
  * The resource provider that represents a device has the following traits:
      o Vendor/Category trait: e.g. CUSTOM_GPU_AMD, CUSTOM_FPGA_XILINX.
      o Device type trait which is a refinement of vendor/category trait
        e.g. CUSTOM_FPGA_XILINX_VU9P.

        NOTE: This is not a product or model, at least for FPGAs.
        Multiple products may use the same FPGA chip.
        NOTE: The reason for having both the vendor/category and this
        one is that a flavor may ask for either, depending on the
        granularity desired. IOW, if one driver can handle all devices
        from a vendor (*eye roll*), the flavor can ask for the
        vendor/category trait alone. If there are separate drivers for
        different device families from the same vendor, the flavor must
        specify the trait for the device family.
        NOTE: The equivalent trait for GPUs may be like
        CUSTOM_GPU_NVIDIA_P90, but I'll let others decide if that is a
        product or not.

      o For FPGAs, we have additional traits:
          + Functionality trait: e.g. CUSTOM_FPGA_COMPUTE,
            CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE
          + Region type ID.  e.g. CUSTOM_FPGA_INTEL_REGION_<uuid>.
          + Optionally, a function ID, indicating what function is
            currently programmed in the region RP. e.g.
            CUSTOM_FPGA_INTEL_FUNCTION_<uuid>. Not all implementations
            may provide it. The function trait may change on
            reprogramming, but it is not expected to be frequent.
          + Possibly, CUSTOM_PROGRAMMABLE as a separate trait.

[1] https://review.openstack.org/#/c/554717/

Thanks.

Regards,
Sundar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180529/9320b44b/attachment.html>


More information about the OpenStack-dev mailing list