* Yan Zhao (yan.y.zhao@intel.com) wrote:
yes, include a device_api field is better. for mdev, "device_type=vfio-mdev", is it right?
No, vfio-mdev is not a device API, it's the driver that attaches to the mdev bus device to expose it through vfio. The device_api exposes the actual interface of the vfio device, it's also vfio-pci for typical mdev devices found on x86, but may be vfio-ccw, vfio-ap, etc... See VFIO_DEVICE_API_PCI_STRING and friends.
ok. got it.
device_id=8086591d
Is device_id interpreted relative to device_type? How does this relate to mdev_type? If we have an mdev_type, doesn't that fully defined the software API?
it's parent pci id for mdev actually.
If we need to specify the parent PCI ID then something is fundamentally wrong with the mdev_type. The mdev_type should define a unique, software compatible interface, regardless of the parent device IDs. If a i915-GVTg_V5_2 means different things based on the parent device IDs, then then different mdev_types should be reported for those parent devices.
hmm, then do we allow vendor specific fields? or is it a must that a vendor specific field should have corresponding vendor attribute?
another thing is that the definition of mdev_type in GVT only corresponds to vGPU computing ability currently, e.g. i915-GVTg_V5_2, is 1/2 of a gen9 IGD, i915-GVTg_V4_2 is 1/2 of a gen8 IGD. It is too coarse-grained to live migration compatibility.
Can you explain why that's too coarse? Is this because it's too specific (i.e. that a i915-GVTg_V4_2 could be migrated to a newer device?), or that it's too specific on the exact sizings (i.e. that there may be multiple different sizes of a gen9)? Dave
Do you think we need to update GVT's definition of mdev_type? And is there any guide in mdev_type definition?
mdev_type=i915-GVTg_V5_2
And how are non-mdev devices represented?
non-mdev can opt to not include this field, or as you said below, a vendor signature.
aggregator=1 pv_mode="none+ppgtt+context"
These are meaningless vendor specific matches afaict.
yes, pv_mode and aggregator are vendor specific fields. but they are important to decide whether two devices are compatible. pv_mode means whether a vGPU supports guest paravirtualized api. "none+ppgtt+context" means guest can not use pv, or use ppgtt mode pv or use context mode pv.
interface_version=3
Not much granularity here, I prefer Sean's previous <major>.<minor>[.bugfix] scheme.
yes, <major>.<minor>[.bugfix] scheme may be better, but I'm not sure if it works for a complicated scenario. e.g for pv_mode, (1) initially, pv_mode is not supported, so it's pv_mode=none, it's 0.0.0, (2) then, pv_mode=ppgtt is supported, pv_mode="none+ppgtt", it's 0.1.0, indicating pv_mode=none can migrate to pv_mode="none+ppgtt", but not vice versa. (3) later, pv_mode=context is also supported, pv_mode="none+ppgtt+context", so it's 0.2.0.
But if later, pv_mode=ppgtt is removed. pv_mode="none+context", how to name its version? "none+ppgtt" (0.1.0) is not compatible to "none+context", but "none+ppgtt+context" (0.2.0) is compatible to "none+context".
If pv_mode=ppgtt is removed, then the compatible versions would be 0.0.0 or 1.0.0, ie. the major version would be incremented due to feature removal.
Maintain such scheme is painful to vendor driver.
Migration compatibility is painful, there's no way around that. I think the version scheme is an attempt to push some of that low level burden on the vendor driver, otherwise the management tools need to work on an ever growing matrix of vendor specific features which is going to become unwieldy and is largely meaningless outside of the vendor driver. Instead, the vendor driver can make strategic decisions about where to continue to maintain a support burden and make explicit decisions to maintain or break compatibility. The version scheme is a simplification and abstraction of vendor driver features in order to create a small, logical compatibility matrix. Compromises necessarily need to be made for that to occur.
ok. got it.
COMPATIBLE: device_type=pci device_id=8086591d mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8} this mixed notation will be hard to parse so i would avoid that.
Some background, Intel has been proposing aggregation as a solution to how we scale mdev devices when hardware exposes large numbers of assignable objects that can be composed in essentially arbitrary ways. So for instance, if we have a workqueue (wq), we might have an mdev type for 1wq, 2wq, 3wq,... Nwq. It's not really practical to expose a discrete mdev type for each of those, so they want to define a base type which is composable to other types via this aggregation. This is what this substitution and tagging is attempting to accomplish. So imagine this set of values for cases where it's not practical to unroll the values for N discrete types.
aggregator={val1}/2
So the {val1} above would be substituted here, though an aggregation factor of 1/2 is a head scratcher...
pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"}
I'm lost on this one though. I think maybe it's indicating that it's compatible with any of these, so do we need to list it? Couldn't this be handled by Sean's version proposal where the minor version represents feature compatibility? yes, it's indicating that it's compatible with any of these. Sean's version proposal may also work, but it would be painful for vendor driver to maintain the versions when multiple similar features are involved.
This is something vendor drivers need to consider when adding and removing features.
interface_version={val3:int:2,3}
What does this turn into in a few years, 2,7,12,23,75,96,...
is a range better?
I was really trying to point out that sparseness becomes an issue if the vendor driver is largely disconnected from how their feature addition and deprecation affects migration support. Thanks,
ok. we'll use the x.y.z scheme then.
Thanks Yan
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK