[openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

Jay Pipes jaypipes at gmail.com
Thu Mar 29 17:01:39 UTC 2018


On 03/28/2018 07:03 PM, Nadathur, Sundar wrote:
> Thanks, Eric. Looks like there are no good solutions even as candidates, 
> but only options with varying levels of unacceptability. It is funny 
> that that the option that is considered the least unacceptable is to let 
> the problem happen and then fail the request (last one in your list).
> 
> Could I ask what is the objection to the scheme that applies multiple 
> traits and removes one as needed, apart from the fact that it has races?

The fundamental objection that I've had to various discussions that 
involve abusing traits in this fashion is that you are essentially 
trying to "consume" traits. But traits are *not consumable things*. Only 
resource classes are consumable things.

If you want to track the inventory of a certain thing -- and consume 
those things during scheduling -- then you need to use resource classes 
for that thing. The inventory management system in placement already has 
race protections in it. This means that you won't be able to 
over-allocate a particular consumable accelerated function if there 
isn't inventory capacity for that particular function on an FPGA. 
Likewise, you would not be able to *remove* inventory for a particular 
function on an FPGA if some instance is consuming that particular 
function. This protection does *not* exist if you are tracking 
particular functions with traits; the reason is because an instance 
doesn't *consume* a trait. There's no such thing as "I started an 
instance with accelerated function X and therefore I am consuming trait 
Y on this FPGA."

So, bottom line for me is make sure we're using resource classes for 
consumable items and traits for representing non-consumable capabilities 
**of the resource provider**.

That means that for the (re)-programming scenarios you need to 
dynamically adjust the inventory of a particular FPGA resource provider.

You will need to *add* an inventory item of a custom resource class 
representing the specific function you are flashing *to an empty region*.

You *may* want to *delete* an inventory item of a custom resource class 
representing the specific function *when an instance that was using that 
specific function is terminated*. When the instance is terminated, Nova 
will *automatically* delete allocations of that custom resource class 
associated with the instance if you use a custom resource class to 
represent the particular accelerated function. No such automatic removal 
of allocations is done if you use traits to represent particular 
accelerated functions (again, because traits aren't consumable things).

Best,
-jay



More information about the OpenStack-dev mailing list