[openstack-dev] [Nova] [Cyborg] Tracking multiple functions

Jay Pipes jaypipes at gmail.com
Wed Mar 7 13:21:32 UTC 2018

On 03/06/2018 09:36 PM, Alex Xu wrote:
> 2018-03-07 10:21 GMT+08:00 Alex Xu <soulxu at gmail.com 
> <mailto:soulxu at gmail.com>>:
>     2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.mooney at intel.com
>     <mailto:sean.k.mooney at intel.com>>:
>         __ __
>         __ __
>         *From:*Matthew Booth [mailto:mbooth at redhat.com
>         <mailto:mbooth at redhat.com>]
>         *Sent:* Saturday, March 3, 2018 4:15 PM
>         *To:* OpenStack Development Mailing List (not for usage
>         questions) <openstack-dev at lists.openstack.org
>         <mailto:openstack-dev at lists.openstack.org>>
>         *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>         functions____
>         __ __
>         On 2 March 2018 at 14:31, Jay Pipes <jaypipes at gmail.com
>         <mailto:jaypipes at gmail.com>> wrote:____
>             On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____
>                 Hello Nova team,
>                       During the Cyborg discussion at Rocky PTG, we
>                 proposed a flow for FPGAs wherein the request spec asks
>                 for a device type as a resource class, and optionally a
>                 function (such as encryption) in the extra specs. This
>                 does not seem to work well for the usage model that I’ll
>                 describe below.
>                 An FPGA device may implement more than one function. For
>                 example, it may implement both compression and
>                 encryption. Say a cluster has 10 devices of device type
>                 X, and each of them is programmed to offer 2 instances
>                 of function A and 4 instances of function B. More
>                 specifically, the device may implement 6 PCI functions,
>                 with 2 of them tied to function A, and the other 4 tied
>                 to function B. So, we could have 6 separate instances
>                 accessing functions on the same device.____
>         __ __
>         Does this imply that Cyborg can't reprogram the FPGA at all?____
>         */[Mooney, Sean K] cyborg is intended to support fixed function
>         acclerators also so it will not always be able to program the
>         accelerator. In this case where an fpga is preprogramed with a
>         multi function bitstream that is statically provisioned cyborge
>         will not be able to reprogram the slot if any of the fuctions
>         from that slot are already allocated to an instance. In this
>         case it will have to treat it like a fixed function device and
>         simply allocate a unused  vf  of the corret type if available.
>         ____/*
>         ____
>                 In the current flow, the device type X is modeled as a
>                 resource class, so Placement will count how many of them
>                 are in use. A flavor for ‘RC device-type-X + function A’
>                 will consume one instance of the RC device-type-X.  But
>                 this is not right because this precludes other functions
>                 on the same device instance from getting used.
>                 One way to solve this is to declare functions A and B as
>                 resource classes themselves and have the flavor request
>                 the function RC. Placement will then correctly count the
>                 function instances. However, there is still a problem:
>                 if the requested function A is not available, Placement
>                 will return an empty list of RPs, but we need some way
>                 to reprogram some device to create an instance of
>                 function A.____
>             Clearly, nova is not going to be reprogramming devices with
>             an instance of a particular function.
>             Cyborg might need to have a separate agent that listens to
>             the nova notifications queue and upon seeing an event that
>             indicates a failed build due to lack of resources, then
>             Cyborg can try and reprogram a device and then try
>             rebuilding the original request.____
>         __ __
>         It was my understanding from that discussion that we intend to
>         insert Cyborg into the spawn workflow for device configuration
>         in the same way that we currently insert resources provided by
>         Cinder and Neutron. So while Nova won't be reprogramming a
>         device, it will be calling out to Cyborg to reprogram a device,
>         and waiting while that happens.____
>         My understanding is (and I concede some areas are a little
>         hazy):____
>         * The flavors says device type X with function Y____
>         * Placement tells us everywhere with device type X____
>         * A weigher orders these by devices which already have an
>         available function Y (where is this metadata stored?)____
>         * Nova schedules to host Z____
>         * Nova host Z asks cyborg for a local function Y and blocks____
>            * Cyborg hopefully returns function Y which is already
>         available____
>            * If not, Cyborg reprograms a function Y, then returns it____
>         Can anybody correct me/fill in the gaps?____
>         */[Mooney, Sean K] that correlates closely to my recollection
>         also. As for the metadata I think the weigher may need to call
>         to cyborg to retrieve this as it will not be available in the
>         host state object./*
>     Is it the nova scheduler weigher or we want to support weigh on
>     placement? Function is traits as I think, so can we have
>     preferred_traits? I remember we talk about that parameter in the
>     past, but we don't have good use-case at that time. This is good
>     use-case.
> If we call the Cyborg from the nova scheduler weigher, that will slow 
> down the scheduling a lot also.

Right, which is why I don't want to do any weighing in Placement at all. 
If folks want to sort by things that require long-running code/callbacks 
or silly temporal things like metrics, they can do that in a custom 
weigher in the nova-scheduler and take the performance hit there.


More information about the OpenStack-dev mailing list