[openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow
jaypipes at gmail.com
Thu Mar 29 17:01:39 UTC 2018
On 03/28/2018 07:03 PM, Nadathur, Sundar wrote:
> Thanks, Eric. Looks like there are no good solutions even as candidates,
> but only options with varying levels of unacceptability. It is funny
> that that the option that is considered the least unacceptable is to let
> the problem happen and then fail the request (last one in your list).
> Could I ask what is the objection to the scheme that applies multiple
> traits and removes one as needed, apart from the fact that it has races?
The fundamental objection that I've had to various discussions that
involve abusing traits in this fashion is that you are essentially
trying to "consume" traits. But traits are *not consumable things*. Only
resource classes are consumable things.
If you want to track the inventory of a certain thing -- and consume
those things during scheduling -- then you need to use resource classes
for that thing. The inventory management system in placement already has
race protections in it. This means that you won't be able to
over-allocate a particular consumable accelerated function if there
isn't inventory capacity for that particular function on an FPGA.
Likewise, you would not be able to *remove* inventory for a particular
function on an FPGA if some instance is consuming that particular
function. This protection does *not* exist if you are tracking
particular functions with traits; the reason is because an instance
doesn't *consume* a trait. There's no such thing as "I started an
instance with accelerated function X and therefore I am consuming trait
Y on this FPGA."
So, bottom line for me is make sure we're using resource classes for
consumable items and traits for representing non-consumable capabilities
**of the resource provider**.
That means that for the (re)-programming scenarios you need to
dynamically adjust the inventory of a particular FPGA resource provider.
You will need to *add* an inventory item of a custom resource class
representing the specific function you are flashing *to an empty region*.
You *may* want to *delete* an inventory item of a custom resource class
representing the specific function *when an instance that was using that
specific function is terminated*. When the instance is terminated, Nova
will *automatically* delete allocations of that custom resource class
associated with the instance if you use a custom resource class to
represent the particular accelerated function. No such automatic removal
of allocations is done if you use traits to represent particular
accelerated functions (again, because traits aren't consumable things).
More information about the OpenStack-dev