[openstack-dev] [Nova] [Cyborg] Updates to os-acc proposal

Nadathur, Sundar sundar.nadathur at intel.com
Tue Jul 31 02:35:47 UTC 2018


Hi Eric and all,
     With recent discussions [1], we have convergence on how Power and 
other architectures can use Cyborg. Before I update the spec [2], I am 
setting down some key aspects of the updates, so that we are all aligned.

The accelerator - instance attachment has two parts:

  * The connection between the accelerator and a host-visible attach
    handle, such as a PCI function or a mediated device UUID. We call
    this the Device Half of the attach.
  * The connection between the attach handle and the instance. We name
    this the Instance Half of the attach.

I propose two different extensibility mechanisms:

  * Cyborg drivers deal with device-specific aspects, including
    discovery/enumeration of devices and handling the Device Half of the
    attach (preparing devices/accelerators for attach to an instance,
    post-attach cleanup (if any) after successful attach, releasing
    device/accelerator resources on instance termination or failed
    attach, etc.)
  * os-acc plugins deal with hypervisor/system/architecture-specific
    aspects, including handling the Instance Half of the attach (e.g.
    for libvirt with PCI, preparing the XML snippet to be included in
    the domain XML).

When invoked by Nova compute to attach accelerator(s) to an instance, 
os-acc would call the Cyborg driver to prepare a VAN (Virtual 
Accelerator Nexus, which is a handle object for attaching an accelerator 
to an instance, similar to VIFs for networking). Such preparation may 
involve configuring the device in some way, including programming for 
FPGAs. This sets up a VAN object with the necessary data for the attach 
(e.g. PCI VF, Power DRC index, etc.). Then the os-acc would call a 
plugin to do the needful for that hypervisor, using that VAN. Finally 
the os-acc may call the Cyborg driver again to do any post-attach 
cleanup, if needed.

A more detailed workflow is here: 
https://docs.google.com/drawings/d/1cX06edia_Pr7P5nOB08VsSMsgznyrz4Yy2u8nb596sU/edit?usp=sharing 


Thus, the drivers and plugins are expected to be complementary. For 
example, for 2 devices of types T1 and T2, there shall be 2 separate 
Cyborg drivers. Further, we would have separate plugins for, say, 
x86+KVM systems and Power systems. We could then have four different 
deployments -- T1 on x86+KVM, T2 on x86+KVM, T1 on Power, T2 on Power -- 
by suitable combinations of the drivers and plugins.

It is possible that there may be scenarios where the separation of roles 
between the plugins and the drivers are not so clear-cut. That can be 
addressed by allowing the plugins to call into Cyborg drivers in the 
future and/or by other mechanisms.

One secondary detail to note is that Nova compute calls os-acc per 
instance for all accelerators for that instance, not once for each 
accelerator. There are two reasons for that:

  * I think this is how Nova deals with os-vif [3].
  * If some accelerators got allocated/configured, and the next
    accelerator configuration fails, a rollback needs to be done. This
    is better done in os-acc than Nova compute.

Cyborg drivers are invoked both by the Cyborg agent (for 
discovery/enumeration) and by os-acc (for instance attach). Both shall 
use Stevedore to locate and load the drivers. A single Python module may 
implement both sets of interfaces, like this:

+--------------+         +-------+
| Nova Compute |         |Cyborg |
+----+---------+         |Agent  |
      |                   +---+---+
+----v---+                   |
| os-acc |                   |
+----+---+                   |
      |                       |
      |     Cyborg driver     |
+----v----------------+------v-----------+
|UN/PLUG ACCELERATORS |  DISCOVER        |
|FROM INSTANCES       |  ACCELERATORS    |
|                     |                  |
|* can_handle()       |  * get_devices() |
|* prepareVAN()       |                  |
|* postplug()         |                  |
|* unprepareVAN()     |                  |
+---------------------+------------------+

If there are no objections to the above, I will update the spec [2].

[1] 
http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2018-07-30.log.html#t2018-07-30T16:25:41-2 

[2] https://review.openstack.org/#/c/577438/
[3] 
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1529

Regards,
Sundar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180730/ba317389/attachment.html>


More information about the OpenStack-dev mailing list