[openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

少合冯 lvmxhster at gmail.com
Tue Mar 27 10:38:08 UTC 2018

As I know placement and nova scheduler dedicate to filter and weight.
and nova scheduler is responsible for avoiding race.

Nested provider + traits should cover most scenarios.

Any  special case please let the nova developer and cyborg developer know,
let work together to get a solution.

I re-paste our design (for a POC) I have send it before as follow, hopeful
it can helpful.
We do not let cyborg do any scheduler function( include filter and weight).
It just responsible to do binding for FPGA device and vm instance ( or call
it FPGA devices assignment)

hi all

IMHO, we can consider the upstream of image management and resource
provider management, even scheduler weight.

1.  image  management
For  image management, I miss one things in the meeting.

We have discussed it before.
And Li Liu suggested to add a cyborg wrapper to upload the FPGA image.
This is a good ideas.
For example:

It will call glance upload API to  upload  the image.
This is helpful for us to normalize the tags of image and properties.

To Dutch, Li Liu, Dolpher, Sunder and other FPGA experts:
     How about get agreement on the standardization of glance image
metadata, especially, tags and property.

For the tags:
    IMHO, the "FPGA" is necessary, for there maybe many images managed by
glance, not only fpga image but also VM image. This tag can be a filter
help us to get only fpga images.
    The vendor name is necessary as a tag? Such as "INTEL" or "XILINX"
    The product model is necessary as a tag? Such as "STRATIX10"
    Any others should be in the image tags?
For the properties :
    It should include the function name(this means the accelerator type).
Should it also include stream id and vendor name?
    such as: --property vendor=xilinx --property type=crypto,transcoding
     Any others should be in the image properties?

Li Liu is working on the spec.

2.   provider management.
      resource class, maybe the nested provider supported.
      we can define them as fellow:
      level 1 provider  resource class is  CUSTOM_FPGA_<type>, and level 2
is  CUSTOM_FPGA_<vendor>_<type>,   level 3 is
      { "CUSTOM_FPGA_VF":
           { "num": 3
              "CUSTOM_FPGA_ XILINX _VF": { "num": 1 }
                  { "CUSTOM_FPGA_INTEL_STRATIX10_VF": "num": 1 }
                  { "CUSTOM_FPGA_INTEL_STRATIX11_VF": "num": 1 }
      Not sure I understand correctly.

      And traits should include:  CUSTOM_<domain>_FUNCTION_<function>
      domain  means which project to consume these traits. CYBORG or
ACCELERATOR which is better?  Here it means cyborg care these traits. Nova,
neutron, cinder can ignore them.
       function, can be CRYPTO, TRANSCODING.

To Jay Pipes, Dutch, Li Liu, Dolpher, Sunder and other FPGA/placement
       Any suggestion on it?

3.  scheduler weight.
I think this is not the high priority at present for cyborg.
Zhipeng, Li Liu, Zhuli, Dopher and I have discussed them before for the
deployable model implementation.
We need to add steaming or image information for deployable.
Li Liu and Zhuli's design, they do have add extra info for deployable. So
it can be used for  steaming or image information.

And cyborg API had better support filters for  scheduler weighting.
Such as:
GET /cyborg/v1/accelerators?hosts=cyborg-1, cyborg-2,
It query all the hosts  cyborg-1, cyborg-2, cyborg-3 to get all
accelerators support crypto and transcoding function.
Cyborg API call conductor to get the accelerators information from by these
scheduler can leverage the the accelerators information for weighting.
Maybe  Cyborg API can also help to do the  weighting. But I think this is
not a good idea.

To Sunder:
I know you are interested in scheduler weight and you have some other
weighting solutions.
Hopeful this can useful for you.
REF: https://etherpad.openstack.org/p/cyborg-nova-poc

2018-03-23 12:27 GMT+08:00 Nadathur, Sundar <sundar.nadathur at intel.com>:

> Hi all,
>     There seems to be a possibility of a race condition in the Cyborg/Nova
> flow. Apologies for missing this earlier. (You can refer to the proposed
> Cyborg/Nova spec
> <https://review.openstack.org/#/c/554717/1/doc/specs/rocky/cyborg-nova-sched.rst>
> for details.)
> Consider the scenario where the flavor specifies a resource class for a
> device type, and also specifies a function (e.g. encrypt) in the extra
> specs. The Nova scheduler would only track the device type as a resource,
> and Cyborg needs to track the availability of functions. Further, to keep
> it simple, say all the functions exist all the time (no reprogramming
> involved).
> To recap, here is the scheduler flow for this case:
>    - A request spec with a flavor comes to Nova conductor/scheduler. The
>    flavor has a device type as a resource class, and a function in the extra
>    specs.
>    - Placement API returns the list of RPs (compute nodes) which contain
>    the requested device types (but not necessarily the function).
>    - Cyborg will provide a custom filter which queries Cyborg DB. This
>    needs to check which hosts contain the needed function, and filter out the
>    rest.
>    - The scheduler selects one node from the filtered list, and the
>    request goes to the compute node.
> For the filter to work, the Cyborg DB needs to maintain a table with
> triples of (host, function type, #free units). The filter checks if a given
> host has one or more free units of the requested function type. But, to
> keep the # free units up to date, Cyborg on the selected compute node needs
> to notify the Cyborg API to decrement the #free units when an instance is
> spawned, and to increment them when resources are released.
> Therein lies the catch: this loop from the compute node to controller is
> susceptible to race conditions. For example, if two simultaneous requests
> each ask for function A, and there is only one unit of that available, the
> Cyborg filter will approve both, both may land on the same host, and one
> will fail. This is because Cyborg on the controller does not decrement
> resource usage due to one request before processing the next request.
> This is similar to this previous Nova scheduling issue
> <https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented/placement-claims.html>.
> That was solved by having the scheduler claim a resource in Placement for
> the selected node. I don't see an analog for Cyborg, since it would not
> know which node is selected.
> Thanks in advance for suggestions and solutions.
> Regards,
> Sundar
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180327/996bbe09/attachment.html>

More information about the OpenStack-dev mailing list