[openstack-dev] [Nova] [Cyborg] Tracking multiple functions

Alex Xu soulxu at gmail.com
Tue Mar 20 02:57:50 UTC 2018


2018-03-19 0:34 GMT+08:00 Nadathur, Sundar <sundar.nadathur at intel.com>:

> Sorry for the delayed response. I broadly agree with previous replies.
> For the concerns about the impact of Cyborg weigher on scheduling
> performance , there are some options (apart from filtering candidates as
> much as possible in Placement):
> * Handle hosts in bulk by extending BaseWeigher
> <https://github.com/openstack/nova/blob/master/nova/weights.py#L67> and
> overriding weigh_objects
> <https://github.com/openstack/nova/blob/master/nova/weights.py#L92>(),
> instead of handling one host at a time.
>

Still an external REST call, I guess people still doesn't like that.


>
* If we have to handle one host at a time for whatever reason, since the
> weigher is maintained by Cyborg, it could directly query Cyborg DB rather
> than go through Cyborg REST API. This will be not unlike other weighers.
>

That means when the cyborg DB schema changed, we have to restart the
nova-scheduler to update the weigher also. We couple the two service
upgrade together.


> Given these and other possible optimizations, it may be too soon to worry
> about the performance impact.
>

yea, maybe. What about the preferred traits?


>
> I am working on a spec that will capture the flow discussed in the PTG. I
> will try to address these aspects as well.
>
> Thanks & Regards,
> Sundar
>
>
> On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
>
> @jay I'm also against a weigher in nova/placement. This should be an
> optional step depends on vendor implementation, not a default one.
>
> @Alex I think we should explore the idea of preferred trait.
>
> @Mathew: Like Sean said, Cyborg wants to support both reprogrammable FPGA
> and pre-programed ones.
> Therefore it is correct that in your description, the programming
> operation should be a call from Nova to Cyborg, and cyborg will complete
> the operation while nova waits. The only problem is that the weigher step
> should be an optional one.
>
>
> On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>
>> On 03/06/2018 09:36 PM, Alex Xu wrote:
>>
>>> 2018-03-07 10:21 GMT+08:00 Alex Xu <soulxu at gmail.com <mailto:
>>> soulxu at gmail.com>>:
>>>
>>>
>>>
>>>     2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.mooney at intel.com
>>>     <mailto:sean.k.mooney at intel.com>>:
>>>
>>>         __ __
>>>
>>>         __ __
>>>
>>>         *From:*Matthew Booth [mailto:mbooth at redhat.com
>>>         <mailto:mbooth at redhat.com>]
>>>         *Sent:* Saturday, March 3, 2018 4:15 PM
>>>         *To:* OpenStack Development Mailing List (not for usage
>>>         questions) <openstack-dev at lists.openstack.org
>>>         <mailto:openstack-dev at lists.openstack.org>>
>>>         *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>>>         functions____
>>>
>>>         __ __
>>>
>>>         On 2 March 2018 at 14:31, Jay Pipes <jaypipes at gmail.com
>>>         <mailto:jaypipes at gmail.com>> wrote:____
>>>
>>>             On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____
>>>
>>>                 Hello Nova team,
>>>
>>>                       During the Cyborg discussion at Rocky PTG, we
>>>                 proposed a flow for FPGAs wherein the request spec asks
>>>                 for a device type as a resource class, and optionally a
>>>                 function (such as encryption) in the extra specs. This
>>>                 does not seem to work well for the usage model that I’ll
>>>                 describe below.
>>>
>>>                 An FPGA device may implement more than one function. For
>>>                 example, it may implement both compression and
>>>                 encryption. Say a cluster has 10 devices of device type
>>>                 X, and each of them is programmed to offer 2 instances
>>>                 of function A and 4 instances of function B. More
>>>                 specifically, the device may implement 6 PCI functions,
>>>                 with 2 of them tied to function A, and the other 4 tied
>>>                 to function B. So, we could have 6 separate instances
>>>                 accessing functions on the same device.____
>>>
>>>         __ __
>>>
>>>         Does this imply that Cyborg can't reprogram the FPGA at all?____
>>>
>>>         */[Mooney, Sean K] cyborg is intended to support fixed function
>>>         acclerators also so it will not always be able to program the
>>>         accelerator. In this case where an fpga is preprogramed with a
>>>         multi function bitstream that is statically provisioned cyborge
>>>         will not be able to reprogram the slot if any of the fuctions
>>>         from that slot are already allocated to an instance. In this
>>>         case it will have to treat it like a fixed function device and
>>>         simply allocate a unused  vf  of the corret type if available.
>>>         ____/*
>>>
>>>
>>>         ____
>>>
>>>
>>>                 In the current flow, the device type X is modeled as a
>>>                 resource class, so Placement will count how many of them
>>>                 are in use. A flavor for ‘RC device-type-X + function A’
>>>                 will consume one instance of the RC device-type-X.  But
>>>                 this is not right because this precludes other functions
>>>                 on the same device instance from getting used.
>>>
>>>                 One way to solve this is to declare functions A and B as
>>>                 resource classes themselves and have the flavor request
>>>                 the function RC. Placement will then correctly count the
>>>                 function instances. However, there is still a problem:
>>>                 if the requested function A is not available, Placement
>>>                 will return an empty list of RPs, but we need some way
>>>                 to reprogram some device to create an instance of
>>>                 function A.____
>>>
>>>
>>>             Clearly, nova is not going to be reprogramming devices with
>>>             an instance of a particular function.
>>>
>>>             Cyborg might need to have a separate agent that listens to
>>>             the nova notifications queue and upon seeing an event that
>>>             indicates a failed build due to lack of resources, then
>>>             Cyborg can try and reprogram a device and then try
>>>             rebuilding the original request.____
>>>
>>>         __ __
>>>
>>>         It was my understanding from that discussion that we intend to
>>>         insert Cyborg into the spawn workflow for device configuration
>>>         in the same way that we currently insert resources provided by
>>>         Cinder and Neutron. So while Nova won't be reprogramming a
>>>         device, it will be calling out to Cyborg to reprogram a device,
>>>         and waiting while that happens.____
>>>
>>>         My understanding is (and I concede some areas are a little
>>>         hazy):____
>>>
>>>         * The flavors says device type X with function Y____
>>>
>>>         * Placement tells us everywhere with device type X____
>>>
>>>         * A weigher orders these by devices which already have an
>>>         available function Y (where is this metadata stored?)____
>>>
>>>         * Nova schedules to host Z____
>>>
>>>         * Nova host Z asks cyborg for a local function Y and blocks____
>>>
>>>            * Cyborg hopefully returns function Y which is already
>>>         available____
>>>
>>>            * If not, Cyborg reprograms a function Y, then returns it____
>>>
>>>         Can anybody correct me/fill in the gaps?____
>>>
>>>         */[Mooney, Sean K] that correlates closely to my recollection
>>>         also. As for the metadata I think the weigher may need to call
>>>         to cyborg to retrieve this as it will not be available in the
>>>         host state object./*
>>>
>>>     Is it the nova scheduler weigher or we want to support weigh on
>>>     placement? Function is traits as I think, so can we have
>>>     preferred_traits? I remember we talk about that parameter in the
>>>     past, but we don't have good use-case at that time. This is good
>>>     use-case.
>>>
>>>
>>> If we call the Cyborg from the nova scheduler weigher, that will slow
>>> down the scheduling a lot also.
>>>
>>
>> Right, which is why I don't want to do any weighing in Placement at all.
>> If folks want to sort by things that require long-running code/callbacks or
>> silly temporal things like metrics, they can do that in a custom weigher in
>> the nova-scheduler and take the performance hit there.
>>
>> Best,
>> -jay
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Zhipeng (Howard) Huang
>
> Standard Engineer
> IT Standard & Patent/IT Product Line
> Huawei Technologies Co,. Ltd
> Email: huangzhipeng at huawei.com
> Office: Huawei Industrial Base, Longgang, Shenzhen
>
> (Previous)
> Research Assistant
> Mobile Ad-Hoc Network Lab, Calit2
> University of California, Irvine
> Email: zhipengh at uci.edu
> Office: Calit2 Building Room 2402
>
> OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribehttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180320/0afa24c9/attachment.html>


More information about the OpenStack-dev mailing list