[openstack-dev] [Nova] [Cyborg] Tracking multiple functions

Nadathur, Sundar sundar.nadathur at intel.com
Sun Mar 18 16:34:10 UTC 2018


Sorry for the delayed response. I broadly agree with previous replies.

For the concerns about the impact of Cyborg weigher on scheduling 
performance , there are some options (apart from filtering candidates as 
much as possible in Placement):
* Handle hosts in bulk by extending BaseWeigher 
<https://github.com/openstack/nova/blob/master/nova/weights.py#L67> and 
overriding weigh_objects 
<https://github.com/openstack/nova/blob/master/nova/weights.py#L92>(), 
instead of handling one host at a time.
* If we have to handle one host at a time for whatever reason, since the 
weigher is maintained by Cyborg, it could directly query Cyborg DB 
rather than go through Cyborg REST API. This will be not unlike other 
weighers.

Given these and other possible optimizations, it may be too soon to 
worry about the performance impact.

I am working on a spec that will capture the flow discussed in the PTG. 
I will try to address these aspects as well.

Thanks & Regards,
Sundar

On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
> @jay I'm also against a weigher in nova/placement. This should be an 
> optional step depends on vendor implementation, not a default one.
>
> @Alex I think we should explore the idea of preferred trait.
>
> @Mathew: Like Sean said, Cyborg wants to support both reprogrammable 
> FPGA and pre-programed ones.
> Therefore it is correct that in your description, the programming 
> operation should be a call from Nova to Cyborg, and cyborg will 
> complete the operation while nova waits. The only problem is that the 
> weigher step should be an optional one.
>
>
> On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes <jaypipes at gmail.com 
> <mailto:jaypipes at gmail.com>> wrote:
>
>     On 03/06/2018 09:36 PM, Alex Xu wrote:
>
>         2018-03-07 10:21 GMT+08:00 Alex Xu <soulxu at gmail.com
>         <mailto:soulxu at gmail.com> <mailto:soulxu at gmail.com
>         <mailto:soulxu at gmail.com>>>:
>
>
>
>             2018-03-06 22:45 GMT+08:00 Mooney, Sean K
>         <sean.k.mooney at intel.com <mailto:sean.k.mooney at intel.com>
>             <mailto:sean.k.mooney at intel.com
>         <mailto:sean.k.mooney at intel.com>>>:
>
>                 __ __
>
>                 __ __
>
>                 *From:*Matthew Booth [mailto:mbooth at redhat.com
>         <mailto:mbooth at redhat.com>
>                 <mailto:mbooth at redhat.com <mailto:mbooth at redhat.com>>]
>                 *Sent:* Saturday, March 3, 2018 4:15 PM
>                 *To:* OpenStack Development Mailing List (not for usage
>                 questions) <openstack-dev at lists.openstack.org
>         <mailto:openstack-dev at lists.openstack.org>
>                 <mailto:openstack-dev at lists.openstack.org
>         <mailto:openstack-dev at lists.openstack.org>>>
>                 *Subject:* Re: [openstack-dev] [Nova] [Cyborg]
>         Tracking multiple
>                 functions____
>
>                 __ __
>
>                 On 2 March 2018 at 14:31, Jay Pipes
>         <jaypipes at gmail.com <mailto:jaypipes at gmail.com>
>                 <mailto:jaypipes at gmail.com
>         <mailto:jaypipes at gmail.com>>> wrote:____
>
>                     On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____
>
>                         Hello Nova team,
>
>                               During the Cyborg discussion at Rocky
>         PTG, we
>                         proposed a flow for FPGAs wherein the request
>         spec asks
>                         for a device type as a resource class, and
>         optionally a
>                         function (such as encryption) in the extra
>         specs. This
>                         does not seem to work well for the usage model
>         that I’ll
>                         describe below.
>
>                         An FPGA device may implement more than one
>         function. For
>                         example, it may implement both compression and
>                         encryption. Say a cluster has 10 devices of
>         device type
>                         X, and each of them is programmed to offer 2
>         instances
>                         of function A and 4 instances of function B. More
>                         specifically, the device may implement 6 PCI
>         functions,
>                         with 2 of them tied to function A, and the
>         other 4 tied
>                         to function B. So, we could have 6 separate
>         instances
>                         accessing functions on the same device.____
>
>                 __ __
>
>                 Does this imply that Cyborg can't reprogram the FPGA
>         at all?____
>
>                 */[Mooney, Sean K] cyborg is intended to support fixed
>         function
>                 acclerators also so it will not always be able to
>         program the
>                 accelerator. In this case where an fpga is
>         preprogramed with a
>                 multi function bitstream that is statically
>         provisioned cyborge
>                 will not be able to reprogram the slot if any of the
>         fuctions
>                 from that slot are already allocated to an instance.
>         In this
>                 case it will have to treat it like a fixed function
>         device and
>                 simply allocate a unused  vf  of the corret type if
>         available.
>                 ____/*
>
>
>                 ____
>
>
>                         In the current flow, the device type X is
>         modeled as a
>                         resource class, so Placement will count how
>         many of them
>                         are in use. A flavor for ‘RC device-type-X +
>         function A’
>                         will consume one instance of the RC
>         device-type-X.  But
>                         this is not right because this precludes other
>         functions
>                         on the same device instance from getting used.
>
>                         One way to solve this is to declare functions
>         A and B as
>                         resource classes themselves and have the
>         flavor request
>                         the function RC. Placement will then correctly
>         count the
>                         function instances. However, there is still a
>         problem:
>                         if the requested function A is not available,
>         Placement
>                         will return an empty list of RPs, but we need
>         some way
>                         to reprogram some device to create an instance of
>                         function A.____
>
>
>                     Clearly, nova is not going to be reprogramming
>         devices with
>                     an instance of a particular function.
>
>                     Cyborg might need to have a separate agent that
>         listens to
>                     the nova notifications queue and upon seeing an
>         event that
>                     indicates a failed build due to lack of resources,
>         then
>                     Cyborg can try and reprogram a device and then try
>                     rebuilding the original request.____
>
>                 __ __
>
>                 It was my understanding from that discussion that we
>         intend to
>                 insert Cyborg into the spawn workflow for device
>         configuration
>                 in the same way that we currently insert resources
>         provided by
>                 Cinder and Neutron. So while Nova won't be reprogramming a
>                 device, it will be calling out to Cyborg to reprogram
>         a device,
>                 and waiting while that happens.____
>
>                 My understanding is (and I concede some areas are a little
>                 hazy):____
>
>                 * The flavors says device type X with function Y____
>
>                 * Placement tells us everywhere with device type X____
>
>                 * A weigher orders these by devices which already have an
>                 available function Y (where is this metadata stored?)____
>
>                 * Nova schedules to host Z____
>
>                 * Nova host Z asks cyborg for a local function Y and
>         blocks____
>
>                    * Cyborg hopefully returns function Y which is already
>                 available____
>
>                    * If not, Cyborg reprograms a function Y, then
>         returns it____
>
>                 Can anybody correct me/fill in the gaps?____
>
>                 */[Mooney, Sean K] that correlates closely to my
>         recollection
>                 also. As for the metadata I think the weigher may need
>         to call
>                 to cyborg to retrieve this as it will not be available
>         in the
>                 host state object./*
>
>             Is it the nova scheduler weigher or we want to support
>         weigh on
>             placement? Function is traits as I think, so can we have
>             preferred_traits? I remember we talk about that parameter
>         in the
>             past, but we don't have good use-case at that time. This
>         is good
>             use-case.
>
>
>         If we call the Cyborg from the nova scheduler weigher, that
>         will slow down the scheduling a lot also.
>
>
>     Right, which is why I don't want to do any weighing in Placement
>     at all. If folks want to sort by things that require long-running
>     code/callbacks or silly temporal things like metrics, they can do
>     that in a custom weigher in the nova-scheduler and take the
>     performance hit there.
>
>     Best,
>     -jay
>
>
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
>
>
>
> -- 
> Zhipeng (Howard) Huang
>
> Standard Engineer
> IT Standard & Patent/IT Product Line
> Huawei Technologies Co,. Ltd
> Email: huangzhipeng at huawei.com <mailto:huangzhipeng at huawei.com>
> Office: Huawei Industrial Base, Longgang, Shenzhen
>
> (Previous)
> Research Assistant
> Mobile Ad-Hoc Network Lab, Calit2
> University of California, Irvine
> Email: zhipengh at uci.edu <mailto:zhipengh at uci.edu>
> Office: Calit2 Building Room 2402
>
> OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180318/5d4a6f66/attachment.html>


More information about the OpenStack-dev mailing list