[cyborg][nova][qa]Question about Accelerator unbinding
hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now? Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | uuid | ed205084-f58d-4fad-99b4-327a1398858f | | state | Unbound | | device_profile_name | ssd | | hostname | None | | device_rp_uuid | None | | instance_uuid | None | | attach_handle_type | | | attach_handle_info | {} | +---------------------+--------------------------------------+ step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
Emm, the arq unbind cmd is used of deleting instance. Recently we donot support hot-plug/unplug device for instance. We only support binding devices when create instance and unbinding devices when delete instance. Best Regards. 发件人: Di XiaoLi [mailto:xiaolihope1008@163.com] 发送时间: 2021年12月13日 9:54 收件人: openstack-discuss <openstack-discuss@lists.openstack.org> 主题: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now? Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | uuid | ed205084-f58d-4fad-99b4-327a1398858f | | state | Unbound | | device_profile_name | ssd | | hostname | None | | device_rp_uuid | None | | instance_uuid | None | | attach_handle_type | | | attach_handle_info | {} | +---------------------+--------------------------------------+ step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
Okay, I got it. Thanks very much for your answer. 在2021年12月13日 16:29,Alex Song (宋文平)<songwenping@inspur.com> 写道: Emm, the arq unbind cmd is used of deleting instance. Recently we donot support hot-plug/unplug device for instance. We only support binding devices when create instance and unbinding devices when delete instance. Best Regards. 发件人: Di XiaoLi [mailto:xiaolihope1008@163.com] 发送时间: 2021年12月13日 9:54 收件人: openstack-discuss <openstack-discuss@lists.openstack.org> 主题: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now? Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | uuid | ed205084-f58d-4fad-99b4-327a1398858f | | state | Unbound | | device_profile_name | ssd | | hostname | None | | device_rp_uuid | None | | instance_uuid | None | | attach_handle_type | | | attach_handle_info | {} | +---------------------+--------------------------------------+ step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
Anyway, we should shield this cmd to avoid misoperation of arq and lead the instance start error for now. 发件人: Alex Song (宋文平) 发送时间: 2021年12月13日 16:27 收件人: 'xiaolihope1008@163.com' <xiaolihope1008@163.com>; 'openstack-discuss@lists.openstack.org' <openstack-discuss@lists.openstack.org> 主题: 答复: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding Emm, the arq unbind cmd is used of deleting instance. Recently we donot support hot-plug/unplug device for instance. We only support binding devices when create instance and unbinding devices when delete instance. Best Regards. 发件人: Di XiaoLi [mailto:xiaolihope1008@163.com] 发送时间: 2021年12月13日 9:54 收件人: openstack-discuss <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> > 主题: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now? Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | uuid | ed205084-f58d-4fad-99b4-327a1398858f | | state | Unbound | | device_profile_name | ssd | | hostname | None | | device_rp_uuid | None | | instance_uuid | None | | attach_handle_type | | | attach_handle_info | {} | +---------------------+--------------------------------------+ step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
Yes, Alex, this command does confuse me a bit. Hope we can support bind/unbind accelerators in the future. I also found some errors in kolla-ansible when deploying cyborg. I want to know how active is the cyborg project? Shall we fix the deploy bug in kolla-ansible and backport the bug fixes to released branches? 在2021年12月13日 16:35,Alex Song (宋文平)<songwenping@inspur.com> 写道: Anyway, we should shield this cmd to avoid misoperation of arq and lead the instance start error for now. 发件人: Alex Song (宋文平) 发送时间: 2021年12月13日 16:27 收件人: 'xiaolihope1008@163.com' <xiaolihope1008@163.com>; 'openstack-discuss@lists.openstack.org' <openstack-discuss@lists.openstack.org> 主题:答复: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding Emm, the arq unbind cmd is used of deleting instance. Recently we donot support hot-plug/unplug device for instance. We only support binding devices when create instance and unbinding devices when delete instance. Best Regards. 发件人: Di XiaoLi [mailto:xiaolihope1008@163.com] 发送时间: 2021年12月13日 9:54 收件人: openstack-discuss <openstack-discuss@lists.openstack.org> 主题: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now? Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | uuid | ed205084-f58d-4fad-99b4-327a1398858f | | state | Unbound | | device_profile_name | ssd | | hostname | None | | device_rp_uuid | None | | instance_uuid | None | | attach_handle_type | | | attach_handle_info | {} | +---------------------+--------------------------------------+ step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
About the bind/unbind feature, we have discussed on the nova PTG[1], they are in our task list. Welcome to join us if you have interest. We also use kolla-ansible to deploy cyborg, but found no problem right now, if you meet error, pls report bugs on the launchpad[2] and fix them if you have time. [1] https://etherpad.opendev.org/p/nova-xena-ptg#L375 [2] https://bugs.launchpad.net/openstack-cyborg 发件人: Di XiaoLi [mailto:xiaolihope1008@163.com] 发送时间: 2021年12月13日 16:46 收件人: Alex Song (宋文平) <songwenping@inspur.com> 抄送: openstack-discuss@lists.openstack.org 主题: 回复:答复: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding Yes, Alex, this command does confuse me a bit. Hope we can support bind/unbind accelerators in the future. I also found some errors in kolla-ansible when deploying cyborg. I want to know how active is the cyborg project? Shall we fix the deploy bug in kolla-ansible and backport the bug fixes to released branches? 在2021年12月13日 16:35, <mailto:songwenping@inspur.com> Alex Song (宋文平)<songwenping@inspur.com> 写道: Anyway, we should shield this cmd to avoid misoperation of arq and lead the instance start error for now. 发件人: Alex Song (宋文平) 发送时间: 2021年12月13日 16:27 收件人: 'xiaolihope1008@163.com <mailto:xiaolihope1008@163.com> ' <xiaolihope1008@163.com <mailto:xiaolihope1008@163.com> >; 'openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> ' <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> > 主题: 答复: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding Emm, the arq unbind cmd is used of deleting instance. Recently we donot support hot-plug/unplug device for instance. We only support binding devices when create instance and unbinding devices when delete instance. Best Regards. 发件人: Di XiaoLi [mailto:xiaolihope1008@163.com] 发送时间: 2021年12月13日 9:54 收件人: openstack-discuss <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> > 主题: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now? Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | uuid | ed205084-f58d-4fad-99b4-327a1398858f | | state | Unbound | | device_profile_name | ssd | | hostname | None | | device_rp_uuid | None | | instance_uuid | None | | attach_handle_type | | | attach_handle_info | {} | +---------------------+--------------------------------------+ step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
Anyway, we should shield this cmd to avoid misoperation of arq and lead the instance start error for now. 发件人: Alex Song (宋文平) 发送时间: 2021年12月13日 16:27 收件人: 'xiaolihope1008@163.com' <xiaolihope1008@163.com>; 'openstack-discuss@lists.openstack.org' <openstack-discuss@lists.openstack.org> 主题: 答复: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding Emm, the arq unbind cmd is used of deleting instance. Recently we donot support hot-plug/unplug device for instance. We only support binding devices when create instance and unbinding devices when delete instance. Best Regards. 发件人: Di XiaoLi [mailto:xiaolihope1008@163.com] 发送时间: 2021年12月13日 9:54 收件人: openstack-discuss <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> > 主题: [lists.openstack.org代发][cyborg][nova][qa]Question about Accelerator unbinding hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now? Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | uuid | ed205084-f58d-4fad-99b4-327a1398858f | | state | Unbound | | device_profile_name | ssd | | hostname | None | | device_rp_uuid | None | | instance_uuid | None | | attach_handle_type | | | attach_handle_info | {} | +---------------------+--------------------------------------+ step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
On Mon, 2021-12-13 at 09:54 +0800, Di XiaoLi wrote:
hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? it has two usecases. it call by nova when nova is deleting or moving the vm it can be used by an enduser if they are not using cybrog with nova. 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now?
the only way to do that for a nova instance is to resize it to a flavor that does not request the device via the cyborg device profiel in the extra spec. more recently we have started to support using cybrog for neutron nics too. in this case the unbinding can be done by doing a port detach and the device should be remvoed from the vm and unbond in cybrog.
Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f
+---------------------+--------------------------------------+
Field | Value | +---------------------+--------------------------------------+ uuid | ed205084-f58d-4fad-99b4-327a1398858f | state | Unbound | device_profile_name | ssd | hostname | None | device_rp_uuid | None | instance_uuid | None | attach_handle_type | | attach_handle_info | {} | +---------------------+--------------------------------------+
step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
this should be treated as an admin/service only oepration wehn using cybrog with nova more on that below. that is what i would expect and is the correct behavior as you violated the contract between nova and cyborg by unbindng the arq. when useing cyborg with nova you shoudl never use the cyborg api driectly expcit to list the devcice profiles. you should treat it like placment in that regard. cybrog when used with nova today is an internal service that end user should at most have read only access too list the profiles. nova does not support cybrog device hot/cold attach or detach. the only way to add or remove a device form cyborg via nova is the device profile request in the flavor. so the only support opartion to change the attach device is resize to a differnt flavor. as i said above if you are attching smartnics using cyborg via neutron then you can also attach/devatch cybrog device by attaching/detaching the neutron port which contains the device profile request. note that if the deivce does not exist on the currnt host the attch will likely fail. detach should more or less always work unless there is an internal errror with one of the 3 service invovled.
Well,I see. Thanks to Sean for your detailed answer! On 12/14/2021 00:33,Sean Mooney<smooney@redhat.com> wrote: On Mon, 2021-12-13 at 09:54 +0800, Di XiaoLi wrote: hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? it has two usecases. it call by nova when nova is deleting or moving the vm it can be used by an enduser if they are not using cybrog with nova. 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now? the only way to do that for a nova instance is to resize it to a flavor that does not request the device via the cyborg device profiel in the extra spec. more recently we have started to support using cybrog for neutron nics too. in this case the unbinding can be done by doing a port detach and the device should be remvoed from the vm and unbond in cybrog. Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f this should be treated as an admin/service only oepration wehn using cybrog with nova more on that below. +---------------------+--------------------------------------+ Field | Value | +---------------------+--------------------------------------+ uuid | ed205084-f58d-4fad-99b4-327a1398858f | state | Unbound | device_profile_name | ssd | hostname | None | device_rp_uuid | None | instance_uuid | None | attach_handle_type | | attach_handle_info | {} | +---------------------+--------------------------------------+ step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921" that is what i would expect and is the correct behavior as you violated the contract between nova and cyborg by unbindng the arq. when useing cyborg with nova you shoudl never use the cyborg api driectly expcit to list the devcice profiles. you should treat it like placment in that regard. cybrog when used with nova today is an internal service that end user should at most have read only access too list the profiles. nova does not support cybrog device hot/cold attach or detach. the only way to add or remove a device form cyborg via nova is the device profile request in the flavor. so the only support opartion to change the attach device is resize to a differnt flavor. as i said above if you are attching smartnics using cyborg via neutron then you can also attach/devatch cybrog device by attaching/detaching the neutron port which contains the device profile request. note that if the deivce does not exist on the currnt host the attch will likely fail. detach should more or less always work unless there is an internal errror with one of the 3 service invovled.
The current nova using device profile relies on flavor extra_specs, so uninstalling can only be achieved by replacing the flavor, as Sean said. In order to better improve the operability of the accelerator, we need to give a warning in the unbind arq API document of Cyborg, which will be improved later. thanks. brinzhang -----邮件原件----- 发件人: Sean Mooney [mailto:smooney@redhat.com] 发送时间: 2021年12月14日 0:32 收件人: Di XiaoLi <xiaolihope1008@163.com>; openstack-discuss <openstack-discuss@lists.openstack.org> 主题: Re: [cyborg][nova][qa]Question about Accelerator unbinding On Mon, 2021-12-13 at 09:54 +0800, Di XiaoLi wrote:
hi, Cyborg and nova team: I am using cyborg with "Wallaby" release to manage my accelerator devices, while I'm trying to unbind the accelerator I found that the device was not actually unbound from the virtual machine. Here are my questions: 1. What is the function of the arq unbind command in cyborg ? it has two usecases. it call by nova when nova is deleting or moving the vm it can be used by an enduser if they are not using cybrog with nova. 2. How to unbind the accelerator which bounded to vm? Does nova or cyborg support this function now?
the only way to do that for a nova instance is to resize it to a flavor that does not request the device via the cyborg device profiel in the extra spec. more recently we have started to support using cybrog for neutron nics too. in this case the unbinding can be done by doing a port detach and the device should be remvoed from the vm and unbond in cybrog.
Here are my steps: step1: openstack accelerator arq unbind ed205084-f58d-4fad-99b4-327a1398858f
+---------------------+--------------------------------------+
Field | Value | +---------------------+--------------------------------------+ uuid | ed205084-f58d-4fad-99b4-327a1398858f | state | Unbound | device_profile_name | ssd | hostname | None | device_rp_uuid | None | instance_uuid | None | attach_handle_type | | attach_handle_info | {} | +---------------------+--------------------------------------+
step2: login vm and check the device, but it still here. step3: stop vm and start vm, met the following error: "nova.exception.AcceleratorRequestOpFailed: Failed to get accelerator requests: Cyborg returned no accelerator requests for instance ca77ef4e-421c-4c6c-9d76-7618a90ec921"
this should be treated as an admin/service only oepration wehn using cybrog with nova more on that below. that is what i would expect and is the correct behavior as you violated the contract between nova and cyborg by unbindng the arq. when useing cyborg with nova you shoudl never use the cyborg api driectly expcit to list the devcice profiles. you should treat it like placment in that regard. cybrog when used with nova today is an internal service that end user should at most have read only access too list the profiles. nova does not support cybrog device hot/cold attach or detach. the only way to add or remove a device form cyborg via nova is the device profile request in the flavor. so the only support opartion to change the attach device is resize to a differnt flavor. as i said above if you are attching smartnics using cyborg via neutron then you can also attach/devatch cybrog device by attaching/detaching the neutron port which contains the device profile request. note that if the deivce does not exist on the currnt host the attch will likely fail. detach should more or less always work unless there is an internal errror with one of the 3 service invovled.
participants (4)
-
Alex Song (宋文平)
-
Brin Zhang(张百林)
-
Di XiaoLi
-
Sean Mooney