[openstack-dev] [Nova] [Cyborg] Tracking multiple functions
Alex Xu
soulxu at gmail.com
Tue Mar 20 02:57:50 UTC 2018
2018-03-19 0:34 GMT+08:00 Nadathur, Sundar <sundar.nadathur at intel.com>:
> Sorry for the delayed response. I broadly agree with previous replies.
> For the concerns about the impact of Cyborg weigher on scheduling
> performance , there are some options (apart from filtering candidates as
> much as possible in Placement):
> * Handle hosts in bulk by extending BaseWeigher
> <https://github.com/openstack/nova/blob/master/nova/weights.py#L67> and
> overriding weigh_objects
> <https://github.com/openstack/nova/blob/master/nova/weights.py#L92>(),
> instead of handling one host at a time.
>
Still an external REST call, I guess people still doesn't like that.
>
* If we have to handle one host at a time for whatever reason, since the
> weigher is maintained by Cyborg, it could directly query Cyborg DB rather
> than go through Cyborg REST API. This will be not unlike other weighers.
>
That means when the cyborg DB schema changed, we have to restart the
nova-scheduler to update the weigher also. We couple the two service
upgrade together.
> Given these and other possible optimizations, it may be too soon to worry
> about the performance impact.
>
yea, maybe. What about the preferred traits?
>
> I am working on a spec that will capture the flow discussed in the PTG. I
> will try to address these aspects as well.
>
> Thanks & Regards,
> Sundar
>
>
> On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
>
> @jay I'm also against a weigher in nova/placement. This should be an
> optional step depends on vendor implementation, not a default one.
>
> @Alex I think we should explore the idea of preferred trait.
>
> @Mathew: Like Sean said, Cyborg wants to support both reprogrammable FPGA
> and pre-programed ones.
> Therefore it is correct that in your description, the programming
> operation should be a call from Nova to Cyborg, and cyborg will complete
> the operation while nova waits. The only problem is that the weigher step
> should be an optional one.
>
>
> On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>
>> On 03/06/2018 09:36 PM, Alex Xu wrote:
>>
>>> 2018-03-07 10:21 GMT+08:00 Alex Xu <soulxu at gmail.com <mailto:
>>> soulxu at gmail.com>>:
>>>
>>>
>>>
>>> 2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.mooney at intel.com
>>> <mailto:sean.k.mooney at intel.com>>:
>>>
>>> __ __
>>>
>>> __ __
>>>
>>> *From:*Matthew Booth [mailto:mbooth at redhat.com
>>> <mailto:mbooth at redhat.com>]
>>> *Sent:* Saturday, March 3, 2018 4:15 PM
>>> *To:* OpenStack Development Mailing List (not for usage
>>> questions) <openstack-dev at lists.openstack.org
>>> <mailto:openstack-dev at lists.openstack.org>>
>>> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
>>> functions____
>>>
>>> __ __
>>>
>>> On 2 March 2018 at 14:31, Jay Pipes <jaypipes at gmail.com
>>> <mailto:jaypipes at gmail.com>> wrote:____
>>>
>>> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____
>>>
>>> Hello Nova team,
>>>
>>> During the Cyborg discussion at Rocky PTG, we
>>> proposed a flow for FPGAs wherein the request spec asks
>>> for a device type as a resource class, and optionally a
>>> function (such as encryption) in the extra specs. This
>>> does not seem to work well for the usage model that I’ll
>>> describe below.
>>>
>>> An FPGA device may implement more than one function. For
>>> example, it may implement both compression and
>>> encryption. Say a cluster has 10 devices of device type
>>> X, and each of them is programmed to offer 2 instances
>>> of function A and 4 instances of function B. More
>>> specifically, the device may implement 6 PCI functions,
>>> with 2 of them tied to function A, and the other 4 tied
>>> to function B. So, we could have 6 separate instances
>>> accessing functions on the same device.____
>>>
>>> __ __
>>>
>>> Does this imply that Cyborg can't reprogram the FPGA at all?____
>>>
>>> */[Mooney, Sean K] cyborg is intended to support fixed function
>>> acclerators also so it will not always be able to program the
>>> accelerator. In this case where an fpga is preprogramed with a
>>> multi function bitstream that is statically provisioned cyborge
>>> will not be able to reprogram the slot if any of the fuctions
>>> from that slot are already allocated to an instance. In this
>>> case it will have to treat it like a fixed function device and
>>> simply allocate a unused vf of the corret type if available.
>>> ____/*
>>>
>>>
>>> ____
>>>
>>>
>>> In the current flow, the device type X is modeled as a
>>> resource class, so Placement will count how many of them
>>> are in use. A flavor for ‘RC device-type-X + function A’
>>> will consume one instance of the RC device-type-X. But
>>> this is not right because this precludes other functions
>>> on the same device instance from getting used.
>>>
>>> One way to solve this is to declare functions A and B as
>>> resource classes themselves and have the flavor request
>>> the function RC. Placement will then correctly count the
>>> function instances. However, there is still a problem:
>>> if the requested function A is not available, Placement
>>> will return an empty list of RPs, but we need some way
>>> to reprogram some device to create an instance of
>>> function A.____
>>>
>>>
>>> Clearly, nova is not going to be reprogramming devices with
>>> an instance of a particular function.
>>>
>>> Cyborg might need to have a separate agent that listens to
>>> the nova notifications queue and upon seeing an event that
>>> indicates a failed build due to lack of resources, then
>>> Cyborg can try and reprogram a device and then try
>>> rebuilding the original request.____
>>>
>>> __ __
>>>
>>> It was my understanding from that discussion that we intend to
>>> insert Cyborg into the spawn workflow for device configuration
>>> in the same way that we currently insert resources provided by
>>> Cinder and Neutron. So while Nova won't be reprogramming a
>>> device, it will be calling out to Cyborg to reprogram a device,
>>> and waiting while that happens.____
>>>
>>> My understanding is (and I concede some areas are a little
>>> hazy):____
>>>
>>> * The flavors says device type X with function Y____
>>>
>>> * Placement tells us everywhere with device type X____
>>>
>>> * A weigher orders these by devices which already have an
>>> available function Y (where is this metadata stored?)____
>>>
>>> * Nova schedules to host Z____
>>>
>>> * Nova host Z asks cyborg for a local function Y and blocks____
>>>
>>> * Cyborg hopefully returns function Y which is already
>>> available____
>>>
>>> * If not, Cyborg reprograms a function Y, then returns it____
>>>
>>> Can anybody correct me/fill in the gaps?____
>>>
>>> */[Mooney, Sean K] that correlates closely to my recollection
>>> also. As for the metadata I think the weigher may need to call
>>> to cyborg to retrieve this as it will not be available in the
>>> host state object./*
>>>
>>> Is it the nova scheduler weigher or we want to support weigh on
>>> placement? Function is traits as I think, so can we have
>>> preferred_traits? I remember we talk about that parameter in the
>>> past, but we don't have good use-case at that time. This is good
>>> use-case.
>>>
>>>
>>> If we call the Cyborg from the nova scheduler weigher, that will slow
>>> down the scheduling a lot also.
>>>
>>
>> Right, which is why I don't want to do any weighing in Placement at all.
>> If folks want to sort by things that require long-running code/callbacks or
>> silly temporal things like metrics, they can do that in a custom weigher in
>> the nova-scheduler and take the performance hit there.
>>
>> Best,
>> -jay
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Zhipeng (Howard) Huang
>
> Standard Engineer
> IT Standard & Patent/IT Product Line
> Huawei Technologies Co,. Ltd
> Email: huangzhipeng at huawei.com
> Office: Huawei Industrial Base, Longgang, Shenzhen
>
> (Previous)
> Research Assistant
> Mobile Ad-Hoc Network Lab, Calit2
> University of California, Irvine
> Email: zhipengh at uci.edu
> Office: Calit2 Building Room 2402
>
> OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribehttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180320/0afa24c9/attachment.html>
More information about the OpenStack-dev
mailing list