[openstack-dev] [Cyborg] [Nova] Cyborg traits
Sylvain Bauza
sbauza at redhat.com
Wed May 30 07:34:43 UTC 2018
On Wed, May 30, 2018 at 1:33 AM, Nadathur, Sundar <sundar.nadathur at intel.com
> wrote:
> Hi all,
> The Cyborg/Nova scheduling spec [1] details what traits will be applied
> to the resource providers that represent devices like GPUs. Some of the
> traits referred to vendor names. I got feedback that traits must not refer
> to products or specific models of devices. I agree. However, we need some
> reference to device types to enable matching the VM driver with the device.
>
> TL;DR We need some reference to device types, but we don't need product
> names. I will update the spec [1] to clarify that. Rest of this email
> clarifies why we need device types in traits, and what traits we propose to
> include.
>
> In general, an accelerator device is operated by two pieces of software: a
> driver in the kernel (which may discover and handle the PF for SR-IOV
> devices), and a driver/library in the guest (which may handle the assigned
> VF).
>
> The device assigned to the VM must match the driver/library packaged in
> the VM. For this, the request must explicitly state what category of
> devices it needs. For example, if the VM needs a GPU, it needs to say
> whether it needs an AMD GPU or an Nvidia GPU, since it may have the
> driver/libraries for that vendor alone. It may also need to state what
> version of Cuda is needed, if it is a Nvidia GPU. These aspects are
> necessarily vendor-specific.
>
>
FWIW, the vGPU implementation for Nova also has the same concern. We want
to provide traits for explicitly say "use this vGPU type" but given it's
related to a specific vendor, we can't just say "ask for this frame buffer
size, or just for the display heads", but rather "we need a vGPU accepting
Quadro vDWS license".
> Further, one driver/library version may handle multiple devices. Since a
> new driver version may be backwards compatible, multiple driver versions
> may manage the same device. The development/release of the driver/library
> inside the VM should be independent of the kernel driver for that device.
>
>
I agree.
> For FPGAs, there is an additional twist as the VM may need specific
> bitstream(s), and they match only specific device/region types. The
> bitstream for a device from a vendor will not fit any other device from the
> same vendor, let alone other vendors. IOW, the region type is specific not
> just to a vendor but to a device type within the vendor. So, it is
> essential to identify the device type.
>
> So, the proposed set of RCs and traits are as below. As we learn more
> about actual usages by operators, we may need to evolve this set.
>
> - There is a resource class per device category e.g.
> CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.
> - The resource provider that represents a device has the following
> traits:
> - Vendor/Category trait: e.g. CUSTOM_GPU_AMD, CUSTOM_FPGA_XILINX.
> - Device type trait which is a refinement of vendor/category trait
> e.g. CUSTOM_FPGA_XILINX_VU9P.
>
> NOTE: This is not a product or model, at least for FPGAs. Multiple
> products may use the same FPGA chip.
> NOTE: The reason for having both the vendor/category and this one is that
> a flavor may ask for either, depending on the granularity desired. IOW, if
> one driver can handle all devices from a vendor (*eye roll*), the flavor
> can ask for the vendor/category trait alone. If there are separate drivers
> for different device families from the same vendor, the flavor must specify
> the trait for the device family.
> NOTE: The equivalent trait for GPUs may be like CUSTOM_GPU_NVIDIA_P90, but
> I'll let others decide if that is a product or not.
>
>
I was about to propose the same for vGPUs in Nova, ie. using custom traits.
The only concern is that we need operators to set the traits directly using
osc-placement instead of having Nova magically provide those traits. But
anyway, given operators need to set the vGPU types they want, I think it's
acceptable.
>
> - For FPGAs, we have additional traits:
> - Functionality trait: e.g. CUSTOM_FPGA_COMPUTE,
> CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE
> - Region type ID. e.g. CUSTOM_FPGA_INTEL_REGION_<uuid>.
> - Optionally, a function ID, indicating what function is
> currently programmed in the region RP. e.g. CUSTOM_FPGA_INTEL_FUNCTION_<uuid>.
> Not all implementations may provide it. The function trait may change on
> reprogramming, but it is not expected to be frequent.
> - Possibly, CUSTOM_PROGRAMMABLE as a separate trait.
>
> [1] https://review.openstack.org/#/c/554717/
>
I'll try to review the spec as soon as I can.
-Sylvain
>
>
> Thanks.
>
> Regards,
> Sundar
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180530/5ae9f9ef/attachment.html>
More information about the OpenStack-dev
mailing list