[openstack-dev] [Nova] [Cyborg] Tracking multiple functions
Jay Pipes
jaypipes at gmail.com
Wed Mar 7 13:21:32 UTC 2018
On 03/06/2018 09:36 PM, Alex Xu wrote:
> 2018-03-07 10:21 GMT+08:00 Alex Xu <soulxu at gmail.com
> <mailto:soulxu at gmail.com>>:
>
>
>
> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.mooney at intel.com
> <mailto:sean.k.mooney at intel.com>>:
>
> __ __
>
> __ __
>
> *From:*Matthew Booth [mailto:mbooth at redhat.com
> <mailto:mbooth at redhat.com>]
> *Sent:* Saturday, March 3, 2018 4:15 PM
> *To:* OpenStack Development Mailing List (not for usage
> questions) <openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>>
> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
> functions____
>
> __ __
>
> On 2 March 2018 at 14:31, Jay Pipes <jaypipes at gmail.com
> <mailto:jaypipes at gmail.com>> wrote:____
>
> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____
>
> Hello Nova team,
>
> During the Cyborg discussion at Rocky PTG, we
> proposed a flow for FPGAs wherein the request spec asks
> for a device type as a resource class, and optionally a
> function (such as encryption) in the extra specs. This
> does not seem to work well for the usage model that I’ll
> describe below.
>
> An FPGA device may implement more than one function. For
> example, it may implement both compression and
> encryption. Say a cluster has 10 devices of device type
> X, and each of them is programmed to offer 2 instances
> of function A and 4 instances of function B. More
> specifically, the device may implement 6 PCI functions,
> with 2 of them tied to function A, and the other 4 tied
> to function B. So, we could have 6 separate instances
> accessing functions on the same device.____
>
> __ __
>
> Does this imply that Cyborg can't reprogram the FPGA at all?____
>
> */[Mooney, Sean K] cyborg is intended to support fixed function
> acclerators also so it will not always be able to program the
> accelerator. In this case where an fpga is preprogramed with a
> multi function bitstream that is statically provisioned cyborge
> will not be able to reprogram the slot if any of the fuctions
> from that slot are already allocated to an instance. In this
> case it will have to treat it like a fixed function device and
> simply allocate a unused vf of the corret type if available.
> ____/*
>
>
> ____
>
>
> In the current flow, the device type X is modeled as a
> resource class, so Placement will count how many of them
> are in use. A flavor for ‘RC device-type-X + function A’
> will consume one instance of the RC device-type-X. But
> this is not right because this precludes other functions
> on the same device instance from getting used.
>
> One way to solve this is to declare functions A and B as
> resource classes themselves and have the flavor request
> the function RC. Placement will then correctly count the
> function instances. However, there is still a problem:
> if the requested function A is not available, Placement
> will return an empty list of RPs, but we need some way
> to reprogram some device to create an instance of
> function A.____
>
>
> Clearly, nova is not going to be reprogramming devices with
> an instance of a particular function.
>
> Cyborg might need to have a separate agent that listens to
> the nova notifications queue and upon seeing an event that
> indicates a failed build due to lack of resources, then
> Cyborg can try and reprogram a device and then try
> rebuilding the original request.____
>
> __ __
>
> It was my understanding from that discussion that we intend to
> insert Cyborg into the spawn workflow for device configuration
> in the same way that we currently insert resources provided by
> Cinder and Neutron. So while Nova won't be reprogramming a
> device, it will be calling out to Cyborg to reprogram a device,
> and waiting while that happens.____
>
> My understanding is (and I concede some areas are a little
> hazy):____
>
> * The flavors says device type X with function Y____
>
> * Placement tells us everywhere with device type X____
>
> * A weigher orders these by devices which already have an
> available function Y (where is this metadata stored?)____
>
> * Nova schedules to host Z____
>
> * Nova host Z asks cyborg for a local function Y and blocks____
>
> * Cyborg hopefully returns function Y which is already
> available____
>
> * If not, Cyborg reprograms a function Y, then returns it____
>
> Can anybody correct me/fill in the gaps?____
>
> */[Mooney, Sean K] that correlates closely to my recollection
> also. As for the metadata I think the weigher may need to call
> to cyborg to retrieve this as it will not be available in the
> host state object./*
>
> Is it the nova scheduler weigher or we want to support weigh on
> placement? Function is traits as I think, so can we have
> preferred_traits? I remember we talk about that parameter in the
> past, but we don't have good use-case at that time. This is good
> use-case.
>
>
> If we call the Cyborg from the nova scheduler weigher, that will slow
> down the scheduling a lot also.
Right, which is why I don't want to do any weighing in Placement at all.
If folks want to sort by things that require long-running code/callbacks
or silly temporal things like metrics, they can do that in a custom
weigher in the nova-scheduler and take the performance hit there.
Best,
-jay
More information about the OpenStack-dev
mailing list