[openstack-dev] [Cyborg] [Nova] Cyborg traits

Nadathur, Sundar sundar.nadathur at intel.com
Wed May 30 10:25:41 UTC 2018

Hi Sylvain,
   Glad to know we are on the same page. I haven't updated the spec with 
this proposal yet, in case I got more comments :). I will do so by today.


On 5/30/2018 12:34 AM, Sylvain Bauza wrote:
> On Wed, May 30, 2018 at 1:33 AM, Nadathur, Sundar 
> <sundar.nadathur at intel.com <mailto:sundar.nadathur at intel.com>> wrote:
>     Hi all,
>        The Cyborg/Nova scheduling spec [1] details what traits will be
>     applied to the resource providers that represent devices like
>     GPUs. Some of the traits referred to vendor names. I got feedback
>     that traits must not refer to products or specific models of
>     devices. I agree. However, we need some reference to device types
>     to enable matching the VM driver with the device.
>     TL;DR We need some reference to device types, but we don't need
>     product names. I will update the spec [1] to clarify that. Rest of
>     this email clarifies why we need device types in traits, and what
>     traits we propose to include.
>     In general, an accelerator device is operated by two pieces of
>     software: a driver in the kernel (which may discover and handle
>     the PF for SR-IOV  devices), and a driver/library in the guest
>     (which may handle the assigned VF).
>     The device assigned to the VM must match the driver/library
>     packaged in the VM. For this, the request must explicitly state
>     what category of devices it needs. For example, if the VM needs a
>     GPU, it needs to say whether it needs an AMD GPU or an Nvidia GPU,
>     since it may have the driver/libraries for that vendor alone. It
>     may also need to state what version of Cuda is needed, if it is a
>     Nvidia GPU. These aspects are necessarily vendor-specific.
> FWIW, the vGPU implementation for Nova also has the same concern. We 
> want to provide traits for explicitly say "use this vGPU type" but 
> given it's related to a specific vendor, we can't just say "ask for 
> this frame buffer size, or just for the display heads", but rather "we 
> need a vGPU accepting Quadro vDWS license".
>     Further, one driver/library version may handle multiple devices.
>     Since a new driver version may be backwards compatible, multiple
>     driver versions may manage the same device. The
>     development/release of the driver/library inside the VM should be
>     independent of the kernel driver for that device.
> I agree.
>     For FPGAs, there is an additional twist as the VM may need
>     specific bitstream(s), and they match only specific device/region
>     types. The bitstream for a device from a vendor will not fit any
>     other device from the same vendor, let alone other vendors. IOW,
>     the region type is specific not just to a vendor but to a device
>     type within the vendor. So, it is essential to identify the device
>     type.
>     So, the proposed set of RCs and traits are as below. As we learn
>     more about actual usages by operators, we may need to evolve this set.
>       * There is a resource class per device category e.g.
>       * The resource provider that represents a device has the
>         following traits:
>           o Vendor/Category trait: e.g. CUSTOM_GPU_AMD,
>             CUSTOM_FPGA_XILINX.
>           o Device type trait which is a refinement of vendor/category
>             trait e.g. CUSTOM_FPGA_XILINX_VU9P.
>             NOTE: This is not a product or model, at least for FPGAs.
>             Multiple products may use the same FPGA chip.
>             NOTE: The reason for having both the vendor/category and
>             this one is that a flavor may ask for either, depending on
>             the granularity desired. IOW, if one driver can handle all
>             devices from a vendor (*eye roll*), the flavor can ask for
>             the vendor/category trait alone. If there are separate
>             drivers for different device families from the same
>             vendor, the flavor must specify the trait for the device
>             family.
>             NOTE: The equivalent trait for GPUs may be like
>             CUSTOM_GPU_NVIDIA_P90, but I'll let others decide if that
>             is a product or not.
> I was about to propose the same for vGPUs in Nova, ie. using custom 
> traits. The only concern is that we need operators to set the traits 
> directly using osc-placement instead of having Nova magically provide 
> those traits. But anyway, given operators need to set the vGPU types 
> they want, I think it's acceptable.
>           o For FPGAs, we have additional traits:
>               + Functionality trait: e.g. CUSTOM_FPGA_COMPUTE,
>               + Region type ID.  e.g. CUSTOM_FPGA_INTEL_REGION_<uuid>.
>               + Optionally, a function ID, indicating what function is
>                 currently programmed in the region RP. e.g.
>                 CUSTOM_FPGA_INTEL_FUNCTION_<uuid>. Not all
>                 implementations may provide it. The function trait may
>                 change on reprogramming, but it is not expected to be
>                 frequent.
>               + Possibly, CUSTOM_PROGRAMMABLE as a separate trait.
>     [1] https://review.openstack.org/#/c/554717/
>     <https://review.openstack.org/#/c/554717/>
> I'll try to review the spec as soon as I can.
> -Sylvain
>     Thanks.
>     Regards,
>     Sundar
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180530/fefd5d95/attachment.html>

More information about the OpenStack-dev mailing list