On Tue, Jul 14, 2020 at 10:16:16AM -0600, Alex Williamson wrote:
On Tue, 14 Jul 2020 11:21:29 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote:
On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote:
The string read from migration_version attribute is defined by device vendor driver and is completely opaque to the userspace. for a Intel vGPU, string format can be defined like "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count".
for an NVMe VF connecting to a remote storage. it could be "PCI ID" + "driver version" + "configured remote storage URL"
for a QAT VF, it may be "PCI ID" + "driver version" + "supported encryption set".
(to avoid namespace confliction from each vendor, we may prefix a driver name to each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1)
It's very strange to define it as opaque and then proceed to describe the contents of that opaque string. The point is that its contents are defined by the vendor driver to describe the device, driver version, and possibly metadata about the configuration of the device. One instance of a device might generate a different string from another. The string that a device produces is not necessarily the only string the vendor driver will accept, for example the driver might support backwards compatible migrations.
IMHO there needs to be a mechanism for the kernel to report via sysfs what versions are supported on a given device. This puts the job of reporting compatible versions directly under the responsibility of the vendor who writes the kernel driver for it. They are the ones with the best knowledge of the hardware they've built and the rules around its compatibility.
The version string discussed previously is the version string that represents a given device, possibly including driver information, configuration, etc. I think what you're asking for here is an enumeration of every possible version string that a given device could accept as an incoming migration stream. If we consider the string as opaque, that means the vendor driver needs to generate a separate string for every possible version it could accept, for every possible configuration option. That potentially becomes an excessive amount of data to either generate or manage.
Am I overestimating how vendors intend to use the version string?
If I'm interpreting your reply & the quoted text orrectly, the version string isn't really a version string in any normal sense of the word "version". Instead it sounds like string encoding a set of features in some arbitrary vendor specific format, which they parse and do compatibility checks on individual pieces ? One or more parts may contain a version number, but its much more than just a version. If that's correct, then I'd prefer we didn't call it a version string, instead call it a "capability string" to make it clear it is expressing a much more general concept, but...
We'd also need to consider devices that we could create, for instance providing the same interface enumeration prior to creating an mdev device to have a confidence level that the new device would be a valid target.
We defined the string as opaque to allow vendor flexibility and because defining a common format is hard. Do we need to revisit this part of the discussion to define the version string as non-opaque with parsing rules, probably with separate incoming vs outgoing interfaces? Thanks,
..even if the huge amount of flexibility is technically relevant from the POV of the hardware/drivers, we should consider whether management apps actually want, or can use, that level of flexibility. The task of picking which host to place a VM on has alot of factors to consider, and when there are a large number of hosts, the total amount of information to check gets correspondingly large. The placement process is also fairly performance critical. Running complex algorithmic logic to check compatibility of devices based on a arbitrary set of rules is likely to be a performance challenge. A flat list of supported strings is a much simpler thing to check as it reduces down to a simple set membership test. IOW, even if there's some complex set of device type / vendor specific rules to check for compatibility, I fear apps will ignore them and just define a very simplified list of compatible string, and ignore all the extra flexibility. I'm sure OpenStack maintainers can speak to this more, as they've put alot of work into their scheduling engine to optimize the way it places VMs largely driven from simple structured data reported from hosts. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|