[openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

Nadathur, Sundar sundar.nadathur at intel.com
Wed Mar 28 17:27:42 UTC 2018


Hi Eric and all,
     I should have clarified that this race condition happens only for 
the case of devices with multiple functions. There is a prior thread 
<http://lists.openstack.org/pipermail/openstack-dev/2018-March/127882.html> 
about it. I was trying to get a solution within Cyborg, but that faces 
this race condition as well.

IIUC, this situation is somewhat similar to the issue with vGPU types 
<http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-03-27.log.html#t2018-03-27T13:41:00> 
(thanks to Alex Xu for pointing this out). In the latter case, we could 
start with an inventory of (vgpu-type-a: 2; vgpu-type-b: 4).  But, after 
consuming a unit of vGPU-type-a, ideally the inventory should change to: 
(vgpu-type-a: 1; vgpu-type-b: 0). With multi-function accelerators, we 
start with an RP inventory of (region-type-A: 1, function-X: 4). But, 
after consuming a unit of that function, ideally the inventory should 
change to: (region-type-A: 0, function-X: 3).

I understand that this approach is controversial :) Also, one difference 
from the vGPU case is that the number and count of vGPU types is static, 
whereas with FPGAs, one could reprogram it to result in more or fewer 
functions. That said, we could hopefully keep this analogy in mind for 
future discussions.

We probably will not support multi-function accelerators in Rocky. This 
discussion is for the longer term.

Regards,
Sundar

On 3/23/2018 12:44 PM, Eric Fried wrote:
> Sundar-
>
> 	First thought is to simplify by NOT keeping inventory information in
> the cyborg db at all.  The provider record in the placement service
> already knows the device (the provider ID, which you can look up in the
> cyborg db) the host (the root_provider_uuid of the provider representing
> the device) and the inventory, and (I hope) you'll be augmenting it with
> traits indicating what functions it's capable of.  That way, you'll
> always get allocation candidates with devices that *can* load the
> desired function; now you just have to engage your weigher to prioritize
> the ones that already have it loaded so you can prefer those.
>
> 	Am I missing something?
>
> 		efried
>
> On 03/22/2018 11:27 PM, Nadathur, Sundar wrote:
>> Hi all,
>>      There seems to be a possibility of a race condition in the
>> Cyborg/Nova flow. Apologies for missing this earlier. (You can refer to
>> the proposed Cyborg/Nova spec
>> <https://review.openstack.org/#/c/554717/1/doc/specs/rocky/cyborg-nova-sched.rst>
>> for details.)
>>
>> Consider the scenario where the flavor specifies a resource class for a
>> device type, and also specifies a function (e.g. encrypt) in the extra
>> specs. The Nova scheduler would only track the device type as a
>> resource, and Cyborg needs to track the availability of functions.
>> Further, to keep it simple, say all the functions exist all the time (no
>> reprogramming involved).
>>
>> To recap, here is the scheduler flow for this case:
>>
>>    * A request spec with a flavor comes to Nova conductor/scheduler. The
>>      flavor has a device type as a resource class, and a function in the
>>      extra specs.
>>    * Placement API returns the list of RPs (compute nodes) which contain
>>      the requested device types (but not necessarily the function).
>>    * Cyborg will provide a custom filter which queries Cyborg DB. This
>>      needs to check which hosts contain the needed function, and filter
>>      out the rest.
>>    * The scheduler selects one node from the filtered list, and the
>>      request goes to the compute node.
>>
>> For the filter to work, the Cyborg DB needs to maintain a table with
>> triples of (host, function type, #free units). The filter checks if a
>> given host has one or more free units of the requested function type.
>> But, to keep the # free units up to date, Cyborg on the selected compute
>> node needs to notify the Cyborg API to decrement the #free units when an
>> instance is spawned, and to increment them when resources are released.
>>
>> Therein lies the catch: this loop from the compute node to controller is
>> susceptible to race conditions. For example, if two simultaneous
>> requests each ask for function A, and there is only one unit of that
>> available, the Cyborg filter will approve both, both may land on the
>> same host, and one will fail. This is because Cyborg on the controller
>> does not decrement resource usage due to one request before processing
>> the next request.
>>
>> This is similar to this previous Nova scheduling issue
>> <https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented/placement-claims.html>.
>> That was solved by having the scheduler claim a resource in Placement
>> for the selected node. I don't see an analog for Cyborg, since it would
>> not know which node is selected.
>>
>> Thanks in advance for suggestions and solutions.
>>
>> Regards,
>> Sundar
>>
>>
>>
>>
>>
>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180328/f1eeb759/attachment.html>


More information about the OpenStack-dev mailing list