<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Eric and all,<br>
With recent discussions [1], we have convergence on how Power
and other architectures can use Cyborg. Before I update the spec
[2], I am setting down some key aspects of the updates, so that we
are all aligned.<br>
<br>
The accelerator - instance attachment has two parts:<br>
<ul>
<li>The connection between the accelerator and a host-visible
attach handle, such as a PCI function or a mediated device UUID.
We call this the Device Half of the attach.</li>
<li>The connection between the attach handle and the instance. We
name this the Instance Half of the attach.<br>
</li>
</ul>
I propose two different extensibility mechanisms:<br>
<ul>
<li>Cyborg drivers deal with device-specific aspects, including
discovery/enumeration of devices and handling the Device Half of
the attach (preparing devices/accelerators for attach to an
instance, post-attach cleanup (if any) after successful attach,
releasing device/accelerator resources on instance termination
or failed attach, etc.)</li>
<li>os-acc plugins deal with
hypervisor/system/architecture-specific aspects, including
handling the Instance Half of the attach (e.g. for libvirt with
PCI, preparing the XML snippet to be included in the domain
XML).<br>
</li>
</ul>
<p>When invoked by Nova compute to attach accelerator(s) to an
instance, os-acc would call the Cyborg driver to prepare a VAN
(Virtual Accelerator Nexus, which is a handle object for attaching
an accelerator to an instance, similar to VIFs for networking).
Such preparation may involve configuring the device in some way,
including programming for FPGAs. This sets up a VAN object with
the necessary data for the attach (e.g. PCI VF, Power DRC index,
etc.). Then the os-acc would call a plugin to do the needful for
that hypervisor, using that VAN. Finally the os-acc may call the
Cyborg driver again to do any post-attach cleanup, if needed. <br>
</p>
<p>A more detailed workflow is here:
<a class="moz-txt-link-freetext" href="https://docs.google.com/drawings/d/1cX06edia_Pr7P5nOB08VsSMsgznyrz4Yy2u8nb596sU/edit?usp=sharing">https://docs.google.com/drawings/d/1cX06edia_Pr7P5nOB08VsSMsgznyrz4Yy2u8nb596sU/edit?usp=sharing</a>
<br>
</p>
<p>Thus, the drivers and plugins are expected to be complementary.
For example, for 2 devices of types T1 and T2, there shall be 2
separate Cyborg drivers. Further, we would have separate plugins
for, say, x86+KVM systems and Power systems. We could then have
four different deployments -- T1 on x86+KVM, T2 on x86+KVM, T1 on
Power, T2 on Power -- by suitable combinations of the drivers and
plugins.</p>
<p>It is possible that there may be scenarios where the separation
of roles between the plugins and the drivers are not so clear-cut.
That can be addressed by allowing the plugins to call into Cyborg
drivers in the future and/or by other mechanisms.<br>
</p>
<p>One secondary detail to note is that Nova compute calls os-acc
per instance for all accelerators for that instance, not once for
each accelerator. There are two reasons for that:
</p>
<ul>
<li>I think this is how Nova deals with os-vif [3].</li>
<li>If some accelerators got allocated/configured, and the next
accelerator configuration fails, a rollback needs to be done.
This is better done in os-acc than Nova compute. </li>
</ul>
<p>Cyborg drivers are invoked both by the Cyborg agent (for
discovery/enumeration) and by os-acc (for instance attach). Both
shall use Stevedore to locate and load the drivers. A single
Python module may implement both sets of interfaces, like this:</p>
<pre>+--------------+ +-------+
| Nova Compute | |Cyborg |
+----+---------+ |Agent |
| +---+---+
+----v---+ |
| os-acc | |
+----+---+ |
| |
| Cyborg driver |
+----v----------------+------v-----------+
|UN/PLUG ACCELERATORS | DISCOVER |
|FROM INSTANCES | ACCELERATORS |
| | |
|* can_handle() | * get_devices() |
|* prepareVAN() | |
|* postplug() | |
|* unprepareVAN() | |
+---------------------+------------------+
</pre>
If there are no objections to the above, I will update the spec [2].<br>
<br>
[1]
<a class="moz-txt-link-freetext" href="http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2018-07-30.log.html#t2018-07-30T16:25:41-2">http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2018-07-30.log.html#t2018-07-30T16:25:41-2</a>
<br>
[2] <a class="moz-txt-link-freetext" href="https://review.openstack.org/#/c/577438/">https://review.openstack.org/#/c/577438/</a> <br>
[3]
<a class="moz-txt-link-freetext" href="https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1529">https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1529</a>
<br>
<br>
Regards,<br>
Sundar<br>
</body>
</html>