Re: device compatibility interface for live migration with assigned devices

5 Aug 2020

      * Yan Zhao (yan.y.zhao@intel.com) wrote:
...
...
...
yes, include a device_api field is better.
for mdev, "device_type=vfio-mdev", is it right?
No, vfio-mdev is not a device API, it's the driver that attaches to the
mdev bus device to expose it through vfio.  The device_api exposes the
actual interface of the vfio device, it's also vfio-pci for typical
mdev devices found on x86, but may be vfio-ccw, vfio-ap, etc...  See
VFIO_DEVICE_API_PCI_STRING and friends.
ok. got it.
...
...
...
...
...
device_id=8086591d
Is device_id interpreted relative to device_type?  How does this
relate to mdev_type?  If we have an mdev_type, doesn't that fully
defined the software API?
it's parent pci id for mdev actually.
If we need to specify the parent PCI ID then something is fundamentally
wrong with the mdev_type.  The mdev_type should define a unique,
software compatible interface, regardless of the parent device IDs.  If
a i915-GVTg_V5_2 means different things based on the parent device IDs,
then then different mdev_types should be reported for those parent
devices.
hmm, then do we allow vendor specific fields?
or is it a must that a vendor specific field should have corresponding
vendor attribute?
another thing is that the definition of mdev_type in GVT only corresponds
to vGPU computing ability currently,
e.g. i915-GVTg_V5_2, is 1/2 of a gen9 IGD, i915-GVTg_V4_2 is 1/2 of a
gen8 IGD.
It is too coarse-grained to live migration compatibility.
Can you explain why that's too coarse?

Is this because it's too specific (i.e. that a i915-GVTg_V4_2 could be
migrated to a newer device?), or that it's too specific on the exact
sizings (i.e. that there may be multiple different sizes of a gen9)?

Dave
...
Do you think we need to update GVT's definition of mdev_type?
And is there any guide in mdev_type definition?
...
...
...
...
...
mdev_type=i915-GVTg_V5_2
And how are non-mdev devices represented?
non-mdev can opt to not include this field, or as you said below, a
vendor signature.
...
...
...
aggregator=1
  pv_mode="none+ppgtt+context"
These are meaningless vendor specific matches afaict.
yes, pv_mode and aggregator are vendor specific fields.
but they are important to decide whether two devices are compatible.
pv_mode means whether a vGPU supports guest paravirtualized api.
"none+ppgtt+context" means guest can not use pv, or use ppgtt mode pv or
use context mode pv.
...
...
...
interface_version=3
Not much granularity here, I prefer Sean's previous
<major>.<minor>[.bugfix] scheme.
yes, <major>.<minor>[.bugfix] scheme may be better, but I'm not sure if
it works for a complicated scenario.
e.g for pv_mode,
(1) initially,  pv_mode is not supported, so it's pv_mode=none, it's 0.0.0,
(2) then, pv_mode=ppgtt is supported, pv_mode="none+ppgtt", it's 0.1.0,
indicating pv_mode=none can migrate to pv_mode="none+ppgtt", but not vice versa.
(3) later, pv_mode=context is also supported,
pv_mode="none+ppgtt+context", so it's 0.2.0.
But if later, pv_mode=ppgtt is removed. pv_mode="none+context", how to
name its version? "none+ppgtt" (0.1.0) is not compatible to
"none+context", but "none+ppgtt+context" (0.2.0) is compatible to
"none+context".
If pv_mode=ppgtt is removed, then the compatible versions would be
0.0.0 or 1.0.0, ie. the major version would be incremented due to
feature removal.
...
Maintain such scheme is painful to vendor driver.
Migration compatibility is painful, there's no way around that.  I
think the version scheme is an attempt to push some of that low level
burden on the vendor driver, otherwise the management tools need to
work on an ever growing matrix of vendor specific features which is
going to become unwieldy and is largely meaningless outside of the
vendor driver.  Instead, the vendor driver can make strategic decisions
about where to continue to maintain a support burden and make explicit
decisions to maintain or break compatibility.  The version scheme is a
simplification and abstraction of vendor driver features in order to
create a small, logical compatibility matrix.  Compromises necessarily
need to be made for that to occur.
ok. got it.
...
...
...
...
...
COMPATIBLE:
  device_type=pci
  device_id=8086591d
  mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8}    
this mixed notation will be hard to parse so i would avoid that.
Some background, Intel has been proposing aggregation as a solution to
how we scale mdev devices when hardware exposes large numbers of
assignable objects that can be composed in essentially arbitrary ways.
So for instance, if we have a workqueue (wq), we might have an mdev
type for 1wq, 2wq, 3wq,... Nwq.  It's not really practical to expose a
discrete mdev type for each of those, so they want to define a base
type which is composable to other types via this aggregation.  This is
what this substitution and tagging is attempting to accomplish.  So
imagine this set of values for cases where it's not practical to unroll
the values for N discrete types.
...
...
aggregator={val1}/2
So the {val1} above would be substituted here, though an aggregation
factor of 1/2 is a head scratcher...
...
...
pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"}
I'm lost on this one though.  I think maybe it's indicating that it's
compatible with any of these, so do we need to list it?  Couldn't this
be handled by Sean's version proposal where the minor version
represents feature compatibility?  
yes, it's indicating that it's compatible with any of these.
Sean's version proposal may also work, but it would be painful for
vendor driver to maintain the versions when multiple similar features
are involved.
This is something vendor drivers need to consider when adding and
removing features.
...
...
...
...
interface_version={val3:int:2,3}
What does this turn into in a few years, 2,7,12,23,75,96,...
is a range better?
I was really trying to point out that sparseness becomes an issue if
the vendor driver is largely disconnected from how their feature
addition and deprecation affects migration support.  Thanks,
ok. we'll use the x.y.z scheme then.
Thanks
Yan
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK