[openstack-dev] [Nova] [Cyborg] Tracking multiple functions

Alex Xu soulxu at gmail.com
Wed Mar 7 02:36:56 UTC 2018


2018-03-07 10:21 GMT+08:00 Alex Xu <soulxu at gmail.com>:

>
>
> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.mooney at intel.com>:
>
>>
>>
>>
>>
>> *From:* Matthew Booth [mailto:mbooth at redhat.com]
>> *Sent:* Saturday, March 3, 2018 4:15 PM
>> *To:* OpenStack Development Mailing List (not for usage questions) <
>> openstack-dev at lists.openstack.org>
>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>> functions
>>
>>
>>
>> On 2 March 2018 at 14:31, Jay Pipes <jaypipes at gmail.com> wrote:
>>
>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:
>>
>> Hello Nova team,
>>
>>      During the Cyborg discussion at Rocky PTG, we proposed a flow for
>> FPGAs wherein the request spec asks for a device type as a resource class,
>> and optionally a function (such as encryption) in the extra specs. This
>> does not seem to work well for the usage model that I’ll describe below.
>>
>> An FPGA device may implement more than one function. For example, it may
>> implement both compression and encryption. Say a cluster has 10 devices of
>> device type X, and each of them is programmed to offer 2 instances of
>> function A and 4 instances of function B. More specifically, the device may
>> implement 6 PCI functions, with 2 of them tied to function A, and the other
>> 4 tied to function B. So, we could have 6 separate instances accessing
>> functions on the same device.
>>
>>
>>
>> Does this imply that Cyborg can't reprogram the FPGA at all?
>>
>> *[Mooney, Sean K] cyborg is intended to support fixed function
>> acclerators also so it will not always be able to program the accelerator.
>> In this case where an fpga is preprogramed with a multi function bitstream
>> that is statically provisioned cyborge will not be able to reprogram the
>> slot if any of the fuctions from that slot are already allocated to an
>> instance. In this case it will have to treat it like a fixed function
>> device and simply allocate a unused  vf  of the corret type if available. *
>>
>>
>>
>>
>>
>> In the current flow, the device type X is modeled as a resource class, so
>> Placement will count how many of them are in use. A flavor for ‘RC
>> device-type-X + function A’ will consume one instance of the RC
>> device-type-X.  But this is not right because this precludes other
>> functions on the same device instance from getting used.
>>
>> One way to solve this is to declare functions A and B as resource classes
>> themselves and have the flavor request the function RC. Placement will then
>> correctly count the function instances. However, there is still a problem:
>> if the requested function A is not available, Placement will return an
>> empty list of RPs, but we need some way to reprogram some device to create
>> an instance of function A.
>>
>>
>> Clearly, nova is not going to be reprogramming devices with an instance
>> of a particular function.
>>
>> Cyborg might need to have a separate agent that listens to the nova
>> notifications queue and upon seeing an event that indicates a failed build
>> due to lack of resources, then Cyborg can try and reprogram a device and
>> then try rebuilding the original request.
>>
>>
>>
>> It was my understanding from that discussion that we intend to insert
>> Cyborg into the spawn workflow for device configuration in the same way
>> that we currently insert resources provided by Cinder and Neutron. So while
>> Nova won't be reprogramming a device, it will be calling out to Cyborg to
>> reprogram a device, and waiting while that happens.
>>
>> My understanding is (and I concede some areas are a little hazy):
>>
>> * The flavors says device type X with function Y
>>
>> * Placement tells us everywhere with device type X
>>
>> * A weigher orders these by devices which already have an available
>> function Y (where is this metadata stored?)
>>
>> * Nova schedules to host Z
>>
>> * Nova host Z asks cyborg for a local function Y and blocks
>>
>>   * Cyborg hopefully returns function Y which is already available
>>
>>   * If not, Cyborg reprograms a function Y, then returns it
>>
>> Can anybody correct me/fill in the gaps?
>>
>> *[Mooney, Sean K] that correlates closely to my recollection also. As for
>> the metadata I think the weigher may need to call to cyborg to retrieve
>> this as it will not be available in the host state object.*
>>
> Is it the nova scheduler weigher or we want to support weigh on placement?
> Function is traits as I think, so can we have preferred_traits? I remember
> we talk about that parameter in the past, but we don't have good use-case
> at that time. This is good use-case.
>

If we call the Cyborg from the nova scheduler weigher, that will slow down
the scheduling a lot also.

>
>
>> Matt
>>
>>
>>
>> --
>>
>> Matthew Booth
>>
>> Red Hat OpenStack Engineer, Compute DFG
>>
>>
>>
>> Phone: +442070094448 <+44%2020%207009%204448> (UK)
>>
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180307/e506b2f4/attachment.html>


More information about the OpenStack-dev mailing list