[openstack-dev] [Nova] [Cyborg] Updates to os-acc proposal
sundar.nadathur at intel.com
Tue Jul 31 02:35:47 UTC 2018
Hi Eric and all,
With recent discussions , we have convergence on how Power and
other architectures can use Cyborg. Before I update the spec , I am
setting down some key aspects of the updates, so that we are all aligned.
The accelerator - instance attachment has two parts:
* The connection between the accelerator and a host-visible attach
handle, such as a PCI function or a mediated device UUID. We call
this the Device Half of the attach.
* The connection between the attach handle and the instance. We name
this the Instance Half of the attach.
I propose two different extensibility mechanisms:
* Cyborg drivers deal with device-specific aspects, including
discovery/enumeration of devices and handling the Device Half of the
attach (preparing devices/accelerators for attach to an instance,
post-attach cleanup (if any) after successful attach, releasing
device/accelerator resources on instance termination or failed
* os-acc plugins deal with hypervisor/system/architecture-specific
aspects, including handling the Instance Half of the attach (e.g.
for libvirt with PCI, preparing the XML snippet to be included in
the domain XML).
When invoked by Nova compute to attach accelerator(s) to an instance,
os-acc would call the Cyborg driver to prepare a VAN (Virtual
Accelerator Nexus, which is a handle object for attaching an accelerator
to an instance, similar to VIFs for networking). Such preparation may
involve configuring the device in some way, including programming for
FPGAs. This sets up a VAN object with the necessary data for the attach
(e.g. PCI VF, Power DRC index, etc.). Then the os-acc would call a
plugin to do the needful for that hypervisor, using that VAN. Finally
the os-acc may call the Cyborg driver again to do any post-attach
cleanup, if needed.
A more detailed workflow is here:
Thus, the drivers and plugins are expected to be complementary. For
example, for 2 devices of types T1 and T2, there shall be 2 separate
Cyborg drivers. Further, we would have separate plugins for, say,
x86+KVM systems and Power systems. We could then have four different
deployments -- T1 on x86+KVM, T2 on x86+KVM, T1 on Power, T2 on Power --
by suitable combinations of the drivers and plugins.
It is possible that there may be scenarios where the separation of roles
between the plugins and the drivers are not so clear-cut. That can be
addressed by allowing the plugins to call into Cyborg drivers in the
future and/or by other mechanisms.
One secondary detail to note is that Nova compute calls os-acc per
instance for all accelerators for that instance, not once for each
accelerator. There are two reasons for that:
* I think this is how Nova deals with os-vif .
* If some accelerators got allocated/configured, and the next
accelerator configuration fails, a rollback needs to be done. This
is better done in os-acc than Nova compute.
Cyborg drivers are invoked both by the Cyborg agent (for
discovery/enumeration) and by os-acc (for instance attach). Both shall
use Stevedore to locate and load the drivers. A single Python module may
implement both sets of interfaces, like this:
| Nova Compute | |Cyborg |
+----+---------+ |Agent |
| os-acc | |
| Cyborg driver |
|UN/PLUG ACCELERATORS | DISCOVER |
|FROM INSTANCES | ACCELERATORS |
| | |
|* can_handle() | * get_devices() |
|* prepareVAN() | |
|* postplug() | |
|* unprepareVAN() | |
If there are no objections to the above, I will update the spec .
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev