device compatibility interface for live migration with assigned devices

Jason Wang jasowang at redhat.com
Wed Aug 19 06:48:34 UTC 2020


On 2020/8/19 下午1:26, Parav Pandit wrote:
>
>> From: Jason Wang <jasowang at redhat.com>
>> Sent: Wednesday, August 19, 2020 8:16 AM
>
>> On 2020/8/18 下午5:32, Parav Pandit wrote:
>>> Hi Jason,
>>>
>>> From: Jason Wang <jasowang at redhat.com>
>>> Sent: Tuesday, August 18, 2020 2:32 PM
>>>
>>>
>>> On 2020/8/18 下午4:55, Daniel P. Berrangé wrote:
>>> On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote:
>>> On 2020/8/14 下午1:16, Yan Zhao wrote:
>>> On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote:
>>> On 2020/8/10 下午3:46, Yan Zhao wrote:
>>> driver is it handled by?
>>> It looks that the devlink is for network device specific, and in
>>> devlink.h, it says include/uapi/linux/devlink.h - Network physical
>>> device Netlink interface, Actually not, I think there used to have
>>> some discussion last year and the conclusion is to remove this
>>> comment.
>>>
>>> [...]
>>>
>>>> Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a long
>> debate on sysfs vs devlink). So if we go with sysfs, at least two APIs needs to be
>> supported ...
>>> We had internal discussion and proposal on this topic.
>>> I wanted Eli Cohen to be back from vacation on Wed 8/19, but since this is
>> active discussion right now, I will share the thoughts anyway.
>>> Here are the initial round of thoughts and proposal.
>>>
>>> User requirements:
>>> ---------------------------
>>> 1. User might want to create one or more vdpa devices per PCI PF/VF/SF.
>>> 2. User might want to create one or more vdpa devices of type net/blk or
>> other type.
>>> 3. User needs to look and dump at the health of the queues for debug purpose.
>>> 4. During vdpa net device creation time, user may have to provide a MAC
>> address and/or VLAN.
>>> 5. User should be able to set/query some of the attributes for
>>> debug/compatibility check 6. When user wants to create vdpa device, it needs
>> to know which device supports creation.
>>> 7. User should be able to see the queue statistics of doorbells, wqes
>>> etc regardless of class type
>>
>> Note that wqes is probably not something common in all of the vendors.
> Yes. I virtq descriptors stats is better to monitor the virtqueues.
>
>>
>>> To address above requirements, there is a need of vendor agnostic tool, so
>> that user can create/config/delete vdpa device(s) regardless of the vendor.
>>> Hence,
>>> We should have a tool that lets user do it.
>>>
>>> Examples:
>>> -------------
>>> (a) List parent devices which supports creating vdpa devices.
>>> It also shows which class types supported by this parent device.
>>> In below command two parent devices support vdpa device creation.
>>> First is PCI VF whose bdf is 03.00:5.
>>> Second is PCI SF whose name is mlx5_sf.1
>>>
>>> $ vdpa list pd
>>
>> What did "pd" mean?
>>
> Parent device which support creation of one or more vdpa devices.
> In a system there can be multiple parent devices which may be support vdpa creation.
> User should be able to know which devices support it, and when user creates a vdpa device, it tells which parent device to use for creation as done in below vdpa dev add example.
>>> pci/0000:03.00:5
>>>     class_supports
>>>       net vdpa
>>> virtbus/mlx5_sf.1
>>
>> So creating mlx5_sf.1 is the charge of devlink?
>>
> Yes.
> But here vdpa tool is working at the parent device identifier {bus+name} instead of devlink identifier.
>
>
>>>     class_supports
>>>       net
>>>
>>> (b) Now add a vdpa device and show the device.
>>> $ vdpa dev add pci/0000:03.00:5 type net
>>
>> So if you want to create devices types other than vdpa on
>> pci/0000:03.00:5 it needs some synchronization with devlink?
> Please refer to FAQ-1,  a new tool is not linked to devlink because vdpa will evolve with time and devlink will fall short.
> So no, it doesn't need any synchronization with devlink.
> As long as parent device exist, user can create it.
> All synchronization will be within drivers/vdpa/vdpa.c
> This user interface is exposed via new netlink family by doing genl_register_family() with new name "vdpa" in drivers/vdpa/vdpa.c.


Just to make sure I understand here.

Consider we had virtbus/mlx5_sf.1. Process A want to create a vDPA 
instance on top of it but Process B want to create a IB instance. Then I 
think some synchronization is needed at at least parent device level?


>
>>
>>> $ vdpa dev show
>>> vdpa0 at pci/0000:03.00:5 type net state inactive maxqueues 8 curqueues 4
>>>
>>> (c) vdpa dev show features vdpa0
>>> iommu platform
>>> version 1
>>>
>>> (d) dump vdpa statistics
>>> $ vdpa dev stats show vdpa0
>>> kickdoorbells 10
>>> wqes 100
>>>
>>> (e) Now delete a vdpa device previously created.
>>> $ vdpa dev del vdpa0
>>>
>>> Design overview:
>>> -----------------------
>>> 1. Above example tool runs over netlink socket interface.
>>> 2. This enables users to return meaningful error strings in addition to code so
>> that user can be more informed.
>>> Often this is missing in ioctl()/configfs/sysfs interfaces.
>>> 3. This tool over netlink enables syscaller tests to be more usable like other
>> subsystems to keep kernel robust
>>> 4. This provides vendor agnostic view of all vdpa capable parent and vdpa
>> devices.
>>> 5. Each driver which supports vdpa device creation, registers the parent device
>> along with supported classes.
>>> FAQs:
>>> --------
>>> 1. Why not using devlink?
>>> Ans: Because as vdpa echo system grows, devlink will fall short of extending
>> vdpa specific params, attributes, stats.
>>
>>
>> This should be fine but it's still not clear to me the difference
>> between a vdpa netlink and a vdpa object in devlink.
>>
> The difference is a vdpa specific tool work at the parent device level.
> It is likely more appropriate to because it can self-contain everything needed to create/delete devices, view/set features, stats.
> Trying to put that in devlink will fall short as devlink doesn’t have vdpa definitions.
> Typically when a class/device subsystem grows, its own tool is wiser like iproute2/ip, iproute2/tc, iproute2/rdma.


Ok, I see.

Thanks





More information about the openstack-discuss mailing list