[ovirt-devel] Re: device compatibility interface for live migration with assigned devices

Jason Wang jasowang at redhat.com
Mon Aug 31 03:07:53 UTC 2020


On 2020/8/21 下午10:52, Cornelia Huck wrote:
> On Fri, 21 Aug 2020 11:14:41 +0800
> Jason Wang <jasowang at redhat.com> wrote:
>
>> On 2020/8/20 下午8:27, Cornelia Huck wrote:
>>> On Wed, 19 Aug 2020 17:28:38 +0800
>>> Jason Wang <jasowang at redhat.com> wrote:
>>>   
>>>> On 2020/8/19 下午4:13, Yan Zhao wrote:
>>>>> On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote:
>>>>>> On 2020/8/19 下午2:59, Yan Zhao wrote:
>>>>>>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote:
>>>>>>>> On 2020/8/19 上午11:30, Yan Zhao wrote:
>>>>>>>>> hi All,
>>>>>>>>> could we decide that sysfs is the interface that every VFIO vendor driver
>>>>>>>>> needs to provide in order to support vfio live migration, otherwise the
>>>>>>>>> userspace management tool would not list the device into the compatible
>>>>>>>>> list?
>>>>>>>>>
>>>>>>>>> if that's true, let's move to the standardizing of the sysfs interface.
>>>>>>>>> (1) content
>>>>>>>>> common part: (must)
>>>>>>>>>         - software_version: (in major.minor.bugfix scheme)
>>>>>>>> This can not work for devices whose features can be negotiated/advertised
>>>>>>>> independently. (E.g virtio devices)
>>> I thought the 'software_version' was supposed to describe kind of a
>>> 'protocol version' for the data we transmit? I.e., you add a new field,
>>> you bump the version number.
>>
>> Ok, but since we mandate backward compatibility of uABI, is this really
>> worth to have a version for sysfs? (Searching on sysfs shows no examples
>> like this)
> I was not thinking about the sysfs interface, but rather about the data
> that is sent over while migrating. E.g. we find out that sending some
> auxiliary data is a good idea and bump to version 1.1.0; version 1.0.0
> cannot deal with the extra data, but version 1.1.0 can deal with the
> older data stream.
>
> (...)


Well, I think what data to transmit during migration is the duty of qemu 
not kernel. And I suspect the idea of reading opaque data (with version) 
from kernel and transmit them to dest is the best approach.


>
>>>>>>>>>         - device_api: vfio-pci or vfio-ccw ...
>>>>>>>>>         - type: mdev type for mdev device or
>>>>>>>>>                 a signature for physical device which is a counterpart for
>>>>>>>>> 	   mdev type.
>>>>>>>>>
>>>>>>>>> device api specific part: (must)
>>>>>>>>>        - pci id: pci id of mdev parent device or pci id of physical pci
>>>>>>>>>          device (device_api is vfio-pci)API here.
>>>>>>>> So this assumes a PCI device which is probably not true.
>>>>>>>>      
>>>>>>> for device_api of vfio-pci, why it's not true?
>>>>>>>
>>>>>>> for vfio-ccw, it's subchannel_type.
>>>>>> Ok but having two different attributes for the same file is not good idea.
>>>>>> How mgmt know there will be a 3rd type?
>>>>> that's why some attributes need to be common. e.g.
>>>>> device_api: it's common because mgmt need to know it's a pci device or a
>>>>>                ccw device. and the api type is already defined vfio.h.
>>>>> 	    (The field is agreed by and actually suggested by Alex in previous mail)
>>>>> type: mdev_type for mdev. if mgmt does not understand it, it would not
>>>>>          be able to create one compatible mdev device.
>>>>> software_version: mgmt can compare the major and minor if it understands
>>>>>          this fields.
>>>> I think it would be helpful if you can describe how mgmt is expected to
>>>> work step by step with the proposed sysfs API. This can help people to
>>>> understand.
>>> My proposal would be:
>>> - check that device_api matches
>>> - check possible device_api specific attributes
>>> - check that type matches [I don't think the combination of mdev types
>>>     and another attribute to determine compatibility is a good idea;
>>
>> Any reason for this? Actually if we only use mdev type to detect the
>> compatibility, it would be much more easier. Otherwise, we are actually
>> re-inventing mdev types.
>>
>> E.g can we have the same mdev types with different device_api and other
>> attributes?
> In the end, the mdev type is represented as a string; but I'm not sure
> we can expect that two types with the same name, but a different
> device_api are related in any way.
>
> If we e.g. compare vfio-pci and vfio-ccw, they are fundamentally
> different.
>
> I was mostly concerned about the aggregation proposal, where type A +
> aggregation value b might be compatible with type B + aggregation value
> a.


Yes, that looks pretty complicated.


>
>>
>>>     actually, the current proposal confuses me every time I look at it]
>>> - check that software_version is compatible, assuming semantic
>>>     versioning
>>> - check possible type-specific attributes
>>
>> I'm not sure if this is too complicated. And I suspect there will be
>> vendor specific attributes:
>>
>> - for compatibility check: I think we should either modeling everything
>> via mdev type or making it totally vendor specific. Having something in
>> the middle will bring a lot of burden
> FWIW, I'm for a strict match on mdev type, and flexibility in per-type
> attributes.


I'm not sure whether the above flexibility can work better than encoding 
them to mdev type. If we really want ultra flexibility, we need making 
the compatibility check totally vendor specific.


>
>> - for provisioning: it's still not clear. As shown in this proposal, for
>> NVME we may need to set remote_url, but unless there will be a subclass
>> (NVME) in the mdev (which I guess not), we can't prevent vendor from
>> using another attribute name, in this case, tricks like attributes
>> iteration in some sub directory won't work. So even if we had some
>> common API for compatibility check, the provisioning API is still vendor
>> specific ...
> Yes, I'm not sure how to deal with the "same thing for different
> vendors" problem. We can try to make sure that in-kernel drivers play
> nicely, but not much more.


Then it's actually a subclass of mdev I guess in the future.

Thanks





More information about the openstack-discuss mailing list