[openstack-dev] [Cyborg] [Nova] Cyborg traits
Nadathur, Sundar
sundar.nadathur at intel.com
Wed May 30 10:25:41 UTC 2018
Hi Sylvain,
Glad to know we are on the same page. I haven't updated the spec with
this proposal yet, in case I got more comments :). I will do so by today.
Thanks,
Sundar
On 5/30/2018 12:34 AM, Sylvain Bauza wrote:
>
>
> On Wed, May 30, 2018 at 1:33 AM, Nadathur, Sundar
> <sundar.nadathur at intel.com <mailto:sundar.nadathur at intel.com>> wrote:
>
> Hi all,
> The Cyborg/Nova scheduling spec [1] details what traits will be
> applied to the resource providers that represent devices like
> GPUs. Some of the traits referred to vendor names. I got feedback
> that traits must not refer to products or specific models of
> devices. I agree. However, we need some reference to device types
> to enable matching the VM driver with the device.
>
> TL;DR We need some reference to device types, but we don't need
> product names. I will update the spec [1] to clarify that. Rest of
> this email clarifies why we need device types in traits, and what
> traits we propose to include.
>
> In general, an accelerator device is operated by two pieces of
> software: a driver in the kernel (which may discover and handle
> the PF for SR-IOV devices), and a driver/library in the guest
> (which may handle the assigned VF).
>
> The device assigned to the VM must match the driver/library
> packaged in the VM. For this, the request must explicitly state
> what category of devices it needs. For example, if the VM needs a
> GPU, it needs to say whether it needs an AMD GPU or an Nvidia GPU,
> since it may have the driver/libraries for that vendor alone. It
> may also need to state what version of Cuda is needed, if it is a
> Nvidia GPU. These aspects are necessarily vendor-specific.
>
>
> FWIW, the vGPU implementation for Nova also has the same concern. We
> want to provide traits for explicitly say "use this vGPU type" but
> given it's related to a specific vendor, we can't just say "ask for
> this frame buffer size, or just for the display heads", but rather "we
> need a vGPU accepting Quadro vDWS license".
>
> Further, one driver/library version may handle multiple devices.
> Since a new driver version may be backwards compatible, multiple
> driver versions may manage the same device. The
> development/release of the driver/library inside the VM should be
> independent of the kernel driver for that device.
>
>
> I agree.
>
> For FPGAs, there is an additional twist as the VM may need
> specific bitstream(s), and they match only specific device/region
> types. The bitstream for a device from a vendor will not fit any
> other device from the same vendor, let alone other vendors. IOW,
> the region type is specific not just to a vendor but to a device
> type within the vendor. So, it is essential to identify the device
> type.
>
> So, the proposed set of RCs and traits are as below. As we learn
> more about actual usages by operators, we may need to evolve this set.
>
> * There is a resource class per device category e.g.
> CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA.
> * The resource provider that represents a device has the
> following traits:
> o Vendor/Category trait: e.g. CUSTOM_GPU_AMD,
> CUSTOM_FPGA_XILINX.
> o Device type trait which is a refinement of vendor/category
> trait e.g. CUSTOM_FPGA_XILINX_VU9P.
>
> NOTE: This is not a product or model, at least for FPGAs.
> Multiple products may use the same FPGA chip.
> NOTE: The reason for having both the vendor/category and
> this one is that a flavor may ask for either, depending on
> the granularity desired. IOW, if one driver can handle all
> devices from a vendor (*eye roll*), the flavor can ask for
> the vendor/category trait alone. If there are separate
> drivers for different device families from the same
> vendor, the flavor must specify the trait for the device
> family.
> NOTE: The equivalent trait for GPUs may be like
> CUSTOM_GPU_NVIDIA_P90, but I'll let others decide if that
> is a product or not.
>
>
> I was about to propose the same for vGPUs in Nova, ie. using custom
> traits. The only concern is that we need operators to set the traits
> directly using osc-placement instead of having Nova magically provide
> those traits. But anyway, given operators need to set the vGPU types
> they want, I think it's acceptable.
>
>
> o For FPGAs, we have additional traits:
> + Functionality trait: e.g. CUSTOM_FPGA_COMPUTE,
> CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE
> + Region type ID. e.g. CUSTOM_FPGA_INTEL_REGION_<uuid>.
> + Optionally, a function ID, indicating what function is
> currently programmed in the region RP. e.g.
> CUSTOM_FPGA_INTEL_FUNCTION_<uuid>. Not all
> implementations may provide it. The function trait may
> change on reprogramming, but it is not expected to be
> frequent.
> + Possibly, CUSTOM_PROGRAMMABLE as a separate trait.
>
> [1] https://review.openstack.org/#/c/554717/
> <https://review.openstack.org/#/c/554717/>
>
>
>
> I'll try to review the spec as soon as I can.
>
> -Sylvain
>
>
>
> Thanks.
>
> Regards,
> Sundar
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180530/fefd5d95/attachment.html>
More information about the OpenStack-dev
mailing list