device compatibility interface for live migration with assigned devices

Cornelia Huck cohuck at redhat.com
Tue Aug 4 16:35:03 UTC 2020


[sorry about not chiming in earlier]

On Wed, 29 Jul 2020 16:05:03 +0800
Yan Zhao <yan.y.zhao at intel.com> wrote:

> On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote:

(...)

> > Based on the feedback we've received, the previously proposed interface
> > is not viable.  I think there's agreement that the user needs to be
> > able to parse and interpret the version information.  Using json seems
> > viable, but I don't know if it's the best option.  Is there any
> > precedent of markup strings returned via sysfs we could follow?  

I don't think encoding complex information in a sysfs file is a viable
approach. Quoting Documentation/filesystems/sysfs.rst:

"Attributes should be ASCII text files, preferably with only one value            
per file. It is noted that it may not be efficient to contain only one           
value per file, so it is socially acceptable to express an array of              
values of the same type.                                                         
                                                                                 
Mixing types, expressing multiple lines of data, and doing fancy                 
formatting of data is heavily frowned upon."

Even though this is an older file, I think these restrictions still
apply.

> I found some examples of using formatted string under /sys, mostly under
> tracing. maybe we can do a similar implementation.
> 
> #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format

Note that this is *not* sysfs (anything under debug/ follows different
rules anyway!)

> 
> name: kvm_mmio
> ID: 32
> format:
>         field:unsigned short common_type;       offset:0;       size:2; signed:0;
>         field:unsigned char common_flags;       offset:2;       size:1; signed:0;
>         field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
>         field:int common_pid;   offset:4;       size:4; signed:1;
> 
>         field:u32 type; offset:8;       size:4; signed:0;
>         field:u32 len;  offset:12;      size:4; signed:0;
>         field:u64 gpa;  offset:16;      size:8; signed:0;
>         field:u64 val;  offset:24;      size:8; signed:0;
> 
> print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" }, { 2, "write" }), REC->len, REC->gpa, REC->val
> 
> 
> #cat /sys/devices/pci0000:00/0000:00:02.0/uevent

'uevent' can probably be considered a special case, I would not really
want to copy it.

> DRIVER=vfio-pci
> PCI_CLASS=30000
> PCI_ID=8086:591D
> PCI_SUBSYS_ID=8086:2212
> PCI_SLOT_NAME=0000:00:02.0
> MODALIAS=pci:v00008086d0000591Dsv00008086sd00002212bc03sc00i00
> 

(...)

> what about a migration_compatible attribute under device node like
> below?
> 
> #cat /sys/bus/pci/devices/0000\:00\:02.0/UUID1/migration_compatible
> SELF:
> 	device_type=pci
> 	device_id=8086591d
> 	mdev_type=i915-GVTg_V5_2
> 	aggregator=1
> 	pv_mode="none+ppgtt+context"
> 	interface_version=3
> COMPATIBLE:
> 	device_type=pci
> 	device_id=8086591d
> 	mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8}
> 	aggregator={val1}/2
> 	pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} 
> 	interface_version={val3:int:2,3}
> COMPATIBLE:
> 	device_type=pci
> 	device_id=8086591d
> 	mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8}
> 	aggregator={val1}/2
> 	pv_mode=""  #"" meaning empty, could be absent in a compatible device
> 	interface_version=1

I'd consider anything of a comparable complexity to be a big no-no. If
anything, this needs to be split into individual files (with many of
them being vendor driver specific anyway.)

I think we can list compatible versions in a range/list format, though.
Something like

cat interface_version 
2.1.3

cat interface_version_compatible
2.0.2-2.0.4,2.1.0-

(indicating that versions 2.0.{2,3,4} and all versions after 2.1.0 are
compatible, considering versions <2 and >2 incompatible by default)

Possible compatibility between different mdev types feels a bit odd to
me, and should not be included by default (only if it makes sense for a
particular vendor driver.)




More information about the openstack-discuss mailing list