[Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

newer
Re: [Release-job-failures] Release...

Arkady.Kanevsky＠dell.com

3 Jan 2020 3 Jan '20

1:19 p.m.

Fellow Open Stackers, I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it. Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage. It is address 3 different use cases and users there are all grouped into single project. 1. Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. 2. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. 3. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event. The first 2 cases cover application life cycle of device usage. The last one covers device life cycle independently how it is used. Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements. Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server). Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes. Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3. Thus, move all device Life-cycle code from Cyborg to Ironic. Concentrate Cyborg of fulfilling the first 2 use cases. Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it. Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need. Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node. [Propose similar model for CNI integration.] Let the discussion start! Thanks., Arkady

Attachments:

attachment.html (text/html — 8.0 KB)

Show replies by date

Zhipeng Huang

3 Jan 3 Jan

5:53 p.m.

Hi Arkady, Thanks for your interest in Cyborg project :) I would like to point out that when we initiated the project there are two specific use cases we want to cover: the accelerators attached locally (via PCIe or other bus type) or remotely (via Ethernet or other fabric type). For the latter one, it is clear that its life cycle is independent from the server (like block device managed by Cinder). For the former one however, its life cycle is not dependent on server for all kinds of accelerators either. For example we already have PCIe based AI accelerator cards or Smart NICs that could be power on/off when the server is on all the time. Therefore it is not a good idea to move all the life cycle management part into Ironic for the above mentioned reasons. Ironic integration is very important for the standalone usage of Cyborg for Kubernetes, Envoy (TLS acceleration) and others alike. Hope this answers your question :) On Sat, Jan 4, 2020 at 5:23 AM <Arkady.Kanevsky@dell.com> wrote:

...

Fellow Open Stackers,

I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it.

Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage.

It is address 3 different use cases and users there are all grouped into single project.

1. Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. 2. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. 3. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event.

The first 2 cases cover application life cycle of device usage.

The last one covers device life cycle independently how it is used.

Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements.

Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server).

Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes.

Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3.

Thus, move all device Life-cycle code from Cyborg to Ironic.

Concentrate Cyborg of fulfilling the first 2 use cases.

Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it.

Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need.

Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node.

[Propose similar model for CNI integration.]

Let the discussion start!

Thanks., Arkady

-- Zhipeng (Howard) Huang Principle Engineer OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C

Arkady.Kanevsky＠dell.com

6 Jan 6 Jan

9:07 a.m.

Zhipeng, Thanks for quick feedback. Where is accelerating device is running? I am aware of 3 possibilities: servers, storage, switches. In each one of them the device is managed as part of server, storage box or switch. The core of my message is separation of device life cycle management in the “box” where it is placed, from the programming the device as needed per application (VM, container). Thanks, Arkady From: Zhipeng Huang <zhipengh512@gmail.com> Sent: Friday, January 3, 2020 7:53 PM To: Kanevsky, Arkady Cc: OpenStack Discuss Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management [EXTERNAL EMAIL] Hi Arkady, Thanks for your interest in Cyborg project :) I would like to point out that when we initiated the project there are two specific use cases we want to cover: the accelerators attached locally (via PCIe or other bus type) or remotely (via Ethernet or other fabric type). For the latter one, it is clear that its life cycle is independent from the server (like block device managed by Cinder). For the former one however, its life cycle is not dependent on server for all kinds of accelerators either. For example we already have PCIe based AI accelerator cards or Smart NICs that could be power on/off when the server is on all the time. Therefore it is not a good idea to move all the life cycle management part into Ironic for the above mentioned reasons. Ironic integration is very important for the standalone usage of Cyborg for Kubernetes, Envoy (TLS acceleration) and others alike. Hope this answers your question :) On Sat, Jan 4, 2020 at 5:23 AM <Arkady.Kanevsky@dell.com<mailto:Arkady.Kanevsky@dell.com>> wrote: Fellow Open Stackers, I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it. Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage. It is address 3 different use cases and users there are all grouped into single project. 1. Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. 2. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. 3. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event. The first 2 cases cover application life cycle of device usage. The last one covers device life cycle independently how it is used. Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements. Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server). Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes. Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3. Thus, move all device Life-cycle code from Cyborg to Ironic. Concentrate Cyborg of fulfilling the first 2 use cases. Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it. Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need. Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node. [Propose similar model for CNI integration.] Let the discussion start! Thanks., Arkady -- Zhipeng (Howard) Huang Principle Engineer OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C

Julia Kreger

1:32 p.m.

Greetings Arkady, I think your message makes a very good case and raises a point that I've been trying to type out for the past hour, but with only different words. We have multiple USER driven interactions with a similarly desired, if not the exact same desired end result where different paths can be taken, as we perceive use cases from "As a user, I would like a VM with a configured accelerator", "I would like any compute resource (VM or Baremetal), with a configured accelerator", to "As an administrator, I need to reallocate a baremetal node for this different use, so my user can leverage its accelerator once they know how and are ready to use it.", and as suggested "I as a user want baremetal with k8s and configured accelerators." And I suspect this diversity of use patterns is where things begin to become difficult. As such I believe, we in essence, have a question of a support or compatibility matrix that definitely has gaps depending on "how" the "user" wants or needs to achieve their goals. And, I think where this entire discussion _can_ go sideways is... (from what I understand) some of these devices need to be flashed by the application user with firmware on demand to meet the user's needs, which is where lifecycle and support interactions begin to become... conflicted. Further complicating matters is the "Metal to Tenant" use cases where the user requesting the machine is not an administrator, but has some level of inherent administrative access to all Operating System accessible devices once their OS has booted. Which makes me wonder "What if the cloud administrators WANT to block the tenant's direct ability to write/flash firmware into accelerator/smartnic/etc?" I suspect if cloud administrators want to block such hardware access, vendors will want to support such a capability. Blocking such access inherently forces some actions into hardware management/maintenance workflows, and may ultimately may cause some of a support matrix's use cases to be unsupportable, again ultimately depending on what exactly the user is attempting to achieve. Going back to the suggestions in the original email, They seem logical to me in terms of the delineation and separation of responsibilities as we present a cohesive solution the users of our software. Greetings Zhipeng, Is there any documentation at present that details the desired support and use cases? I think this would at least help my understanding, since everything that requires the power to be on would still need to be integrated with-in workflows for eventual tighter integration. Also, has Cyborg drafted any plans or proposals for integration? -Julia On Mon, Jan 6, 2020 at 9:14 AM <Arkady.Kanevsky@dell.com> wrote:

...

Zhipeng,

Thanks for quick feedback.

Where is accelerating device is running? I am aware of 3 possibilities: servers, storage, switches.

In each one of them the device is managed as part of server, storage box or switch.

The core of my message is separation of device life cycle management in the “box” where it is placed, from the programming the device as needed per application (VM, container).

Thanks, Arkady

From: Zhipeng Huang <zhipengh512@gmail.com> Sent: Friday, January 3, 2020 7:53 PM To: Kanevsky, Arkady Cc: OpenStack Discuss Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

[EXTERNAL EMAIL]

Hi Arkady,

Thanks for your interest in Cyborg project :) I would like to point out that when we initiated the project there are two specific use cases we want to cover: the accelerators attached locally (via PCIe or other bus type) or remotely (via Ethernet or other fabric type).

For the latter one, it is clear that its life cycle is independent from the server (like block device managed by Cinder). For the former one however, its life cycle is not dependent on server for all kinds of accelerators either. For example we already have PCIe based AI accelerator cards or Smart NICs that could be power on/off when the server is on all the time.

Therefore it is not a good idea to move all the life cycle management part into Ironic for the above mentioned reasons. Ironic integration is very important for the standalone usage of Cyborg for Kubernetes, Envoy (TLS acceleration) and others alike.

Hope this answers your question :)

On Sat, Jan 4, 2020 at 5:23 AM <Arkady.Kanevsky@dell.com> wrote:

Fellow Open Stackers,

I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it.

Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage.

It is address 3 different use cases and users there are all grouped into single project.

Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event.

The first 2 cases cover application life cycle of device usage.

The last one covers device life cycle independently how it is used.

Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements.

Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server).

Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes.

Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3.

Thus, move all device Life-cycle code from Cyborg to Ironic.

Concentrate Cyborg of fulfilling the first 2 use cases.

Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it.

Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need.

Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node.

[Propose similar model for CNI integration.]

Let the discussion start!

Thanks., Arkady

--

Zhipeng (Howard) Huang

Principle Engineer

OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C

Arkady.Kanevsky＠dell.com

7 Jan 7 Jan

3:17 p.m.

Excellent points Julia. It is hard to image that any production env of any customer will allow anybody but administrator to update FW on any device at any time. The security implication are huge. Cheers, Arkady -----Original Message----- From: Julia Kreger <juliaashleykreger@gmail.com> Sent: Monday, January 6, 2020 3:33 PM To: Kanevsky, Arkady Cc: Zhipeng Huang; openstack-discuss Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management [EXTERNAL EMAIL] Greetings Arkady, I think your message makes a very good case and raises a point that I've been trying to type out for the past hour, but with only different words. We have multiple USER driven interactions with a similarly desired, if not the exact same desired end result where different paths can be taken, as we perceive use cases from "As a user, I would like a VM with a configured accelerator", "I would like any compute resource (VM or Baremetal), with a configured accelerator", to "As an administrator, I need to reallocate a baremetal node for this different use, so my user can leverage its accelerator once they know how and are ready to use it.", and as suggested "I as a user want baremetal with k8s and configured accelerators." And I suspect this diversity of use patterns is where things begin to become difficult. As such I believe, we in essence, have a question of a support or compatibility matrix that definitely has gaps depending on "how" the "user" wants or needs to achieve their goals. And, I think where this entire discussion _can_ go sideways is... (from what I understand) some of these devices need to be flashed by the application user with firmware on demand to meet the user's needs, which is where lifecycle and support interactions begin to become... conflicted. Further complicating matters is the "Metal to Tenant" use cases where the user requesting the machine is not an administrator, but has some level of inherent administrative access to all Operating System accessible devices once their OS has booted. Which makes me wonder "What if the cloud administrators WANT to block the tenant's direct ability to write/flash firmware into accelerator/smartnic/etc?" I suspect if cloud administrators want to block such hardware access, vendors will want to support such a capability. Blocking such access inherently forces some actions into hardware management/maintenance workflows, and may ultimately may cause some of a support matrix's use cases to be unsupportable, again ultimately depending on what exactly the user is attempting to achieve. Going back to the suggestions in the original email, They seem logical to me in terms of the delineation and separation of responsibilities as we present a cohesive solution the users of our software. Greetings Zhipeng, Is there any documentation at present that details the desired support and use cases? I think this would at least help my understanding, since everything that requires the power to be on would still need to be integrated with-in workflows for eventual tighter integration. Also, has Cyborg drafted any plans or proposals for integration? -Julia On Mon, Jan 6, 2020 at 9:14 AM <Arkady.Kanevsky@dell.com> wrote:

...

Zhipeng,

Thanks for quick feedback.

Where is accelerating device is running? I am aware of 3 possibilities: servers, storage, switches.

In each one of them the device is managed as part of server, storage box or switch.

The core of my message is separation of device life cycle management in the “box” where it is placed, from the programming the device as needed per application (VM, container).

Thanks, Arkady

From: Zhipeng Huang <zhipengh512@gmail.com> Sent: Friday, January 3, 2020 7:53 PM To: Kanevsky, Arkady Cc: OpenStack Discuss Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

[EXTERNAL EMAIL]

Hi Arkady,

Thanks for your interest in Cyborg project :) I would like to point out that when we initiated the project there are two specific use cases we want to cover: the accelerators attached locally (via PCIe or other bus type) or remotely (via Ethernet or other fabric type).

For the latter one, it is clear that its life cycle is independent from the server (like block device managed by Cinder). For the former one however, its life cycle is not dependent on server for all kinds of accelerators either. For example we already have PCIe based AI accelerator cards or Smart NICs that could be power on/off when the server is on all the time.

Therefore it is not a good idea to move all the life cycle management part into Ironic for the above mentioned reasons. Ironic integration is very important for the standalone usage of Cyborg for Kubernetes, Envoy (TLS acceleration) and others alike.

Hope this answers your question :)

On Sat, Jan 4, 2020 at 5:23 AM <Arkady.Kanevsky@dell.com> wrote:

Fellow Open Stackers,

I have been thinking on how to handle SmartNICs, GPUs, FPGA handling across different projects within OpenStack with Cyborg taking a leading role in it.

Cyborg is important project and address accelerator devices that are part of the server and potentially switches and storage.

It is address 3 different use cases and users there are all grouped into single project.

Application user need to program a portion of the device under management, like GPU, or SmartNIC for that app usage. Having a common way to do it across different device families and across different vendor is very important. And that has to be done every time a VM is deploy that need usage of a device. That is tied with VM scheduling. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case. That is done once during OpenStack deployment but may need reprogramming to configure device for different usage. May or may not require reboot of the server. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event.

The first 2 cases cover application life cycle of device usage.

The last one covers device life cycle independently how it is used.

Managing life cycle of devices is Ironic responsibility, One cannot and should not manage lifecycle of server components independently. Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements.

Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling. We should avoid creating new project for every new component of the server and modify nova and neuron for each new device. (the same will also apply to cinder and manila if smart devices used in its data/control path on a server).

Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes.

Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3.

Thus, move all device Life-cycle code from Cyborg to Ironic.

Concentrate Cyborg of fulfilling the first 2 use cases.

Simplify integration with Nova and Neutron for using these accelerators to use existing Ironic mechanism for it.

Create idempotent calls for use case 1 so Nova and Neutron can use it as part of VM deployment to ensure that devices are programmed for VM under scheduling need.

Create idempotent call(s) for use case 2 for TripleO to setup device for single accelerator usage of a node.

[Propose similar model for CNI integration.]

Let the discussion start!

Thanks., Arkady

--

Zhipeng (Howard) Huang

Principle Engineer

OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C

Jeremy Stanley

3:51 p.m.

On 2020-01-07 23:17:25 +0000 (+0000), Arkady.Kanevsky@dell.com wrote:

...

It is hard to image that any production env of any customer will allow anybody but administrator to update FW on any device at any time. The security implication are huge. [...]

I thought this was precisely the point of exposing FPGA hardware into server instances. Or do you not count programming those as "updating firmware?" -- Jeremy Stanley

Arkady.Kanevsky＠dell.com

8 Jan 8 Jan

8:31 a.m.

Jeremy, Correct. programming devices and "updating firmware" I count as separate activities. Similar to CPU or GPU. -----Original Message----- From: Jeremy Stanley <fungi@yuggoth.org> Sent: Tuesday, January 7, 2020 5:52 PM To: openstack-discuss@lists.openstack.org Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management On 2020-01-07 23:17:25 +0000 (+0000), Arkady.Kanevsky@dell.com wrote:

...

It is hard to image that any production env of any customer will allow anybody but administrator to update FW on any device at any time. The security implication are huge. [...]

I thought this was precisely the point of exposing FPGA hardware into server instances. Or do you not count programming those as "updating firmware?" -- Jeremy Stanley

Julia Kreger

9 Jan 9 Jan

8:52 a.m.

On Wed, Jan 8, 2020 at 8:38 AM <Arkady.Kanevsky@dell.com> wrote:

...

Jeremy, Correct. programming devices and "updating firmware" I count as separate activities. Similar to CPU or GPU.

Which makes me really wonder, where is that line between the activities? I guess the worry, from a security standpoint, is persistent bytecode. I guess I just don't have a good enough understanding of all the facets in this area to have a sense for that. :/

...

-----Original Message----- From: Jeremy Stanley <fungi@yuggoth.org> Sent: Tuesday, January 7, 2020 5:52 PM To: openstack-discuss@lists.openstack.org Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

On 2020-01-07 23:17:25 +0000 (+0000), Arkady.Kanevsky@dell.com wrote:

...
It is hard to image that any production env of any customer will allow anybody but administrator to update FW on any device at any time. The security implication are huge. [...]

I thought this was precisely the point of exposing FPGA hardware into server instances. Or do you not count programming those as "updating firmware?" -- Jeremy Stanley

Nadathur, Sundar

12 Jan 12 Jan

1:42 p.m.

...

From: Julia Kreger <juliaashleykreger@gmail.com> Sent: Monday, January 6, 2020 1:33 PM To: Arkady.Kanevsky@dell.com Cc: Zhipeng Huang <zhipengh512@gmail.com>; openstack-discuss <openstack- discuss@lists.openstack.org> Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

Hi Julia, Lots of good points here.

...

Greetings Arkady,

I think your message makes a very good case and raises a point that I've been trying to type out for the past hour, but with only different words.

We have multiple USER driven interactions with a similarly desired, if not the exact same desired end result where different paths can be taken, as we perceive use cases from "As a user, I would like a VM with a configured accelerator", "I would like any compute resource (VM or Baremetal), with a configured accelerator", to "As an administrator, I need to reallocate a baremetal node for this different use, so my user can leverage its accelerator once they know how and are ready to use it.", and as suggested "I as a user want baremetal with k8s and configured accelerators."

...

And I suspect this diversity of use patterns is where things begin to become difficult. As such I believe, we in essence, have a question of a support or compatibility matrix that definitely has gaps depending on "how" the "user" wants or needs to achieve their goals.

Yes, there are a wide variety of deployments and use cases. There may not be a single silver bullet solution for all of them. There may be different solutions, such as Ironic standalone, Ironic with Nova, and potentially some combination with Cyborg.

...

And, I think where this entire discussion _can_ go sideways is... (from what I understand) some of these devices need to be flashed by the application user with firmware on demand to meet the user's needs, which is where lifecycle and support interactions begin to become... conflicted.

We are probably using different definitions of the term 'firmware.' As I said in another response in this thread, if a device configuration exposes application-specific features or schedulable features, then the term 'firmware update' may not be applicable IMHO, since it is going to be done dynamically as workloads spin up and retire. This is especially so given Arkady's stipulation that firmware updates are done as part of server configuration and as per server vendor's guidelines.

...

Further complicating matters is the "Metal to Tenant" use cases where the user requesting the machine is not an administrator, but has some level of inherent administrative access to all Operating System accessible devices once their OS has booted. Which makes me wonder "What if the cloud administrators WANT to block the tenant's direct ability to write/flash firmware into accelerator/smartnic/etc?"

Yes, admins may want to do that. This can be done (partly) via RBAC, by having different roles for tenants who can use devices but not reprogram them, and for tenants who can program the device with application/scheduling-relevant features (but not firmware), etc.

...

I suspect if cloud administrators want to block such hardware access, vendors will want to support such a capability.

Devices can and usually do offer separate mechanisms for reading from registers, writing to them, updating flash etc. each with associated access permissions. A device vendor can go a bit extra by requiring specific Linux capabilities, such as say CAP_IPC_LOCK for mmap access, in their device driver.

...

Blocking such access inherently forces some actions into hardware management/maintenance workflows, and may ultimately may cause some of a support matrix's use cases to be unsupportable, again ultimately depending on what exactly the user is attempting to achieve.

Not sure if you are expressing a concern here. If the admin is using device features or RBAC to restrict access, then she is intentionally blocking some combinations in your support matrix, right? Users in such a deployment need to live with that.

...

Is there any documentation at present that details the desired support and use cases? I think this would at least help my understanding, since everything that requires the power to be on would still need to be integrated with-in workflows for eventual tighter integration.

The Cyborg spec [1] addresses the Nova/VM-based use cases. [1] https://opendev.org/openstack/cyborg-specs/src/branch/master/specs/train/app...

...

Also, has Cyborg drafted any plans or proposals for integration?

For Nova integration, we have a spec [2]. [2] https://review.opendev.org/#/c/684151/

...

-Julia

Regards, Sundar

Julia Kreger

22 Jan 22 Jan

8:08 a.m.

Circling back to this now since I'm not in meetings and can actually think about this topic. :) On Sun, Jan 12, 2020 at 1:42 PM Nadathur, Sundar <sundar.nadathur@intel.com> wrote:

...

[trim]

...

...
Further complicating matters is the "Metal to Tenant" use cases where the user requesting the machine is not an administrator, but has some level of inherent administrative access to all Operating System accessible devices once their OS has booted. Which makes me wonder "What if the cloud administrators WANT to block the tenant's direct ability to write/flash firmware into accelerator/smartnic/etc?"

Yes, admins may want to do that. This can be done (partly) via RBAC, by having different roles for tenants who can use devices but not reprogram them, and for tenants who can program the device with application/scheduling-relevant features (but not firmware), etc.

I concur that it might be able to do by RBAC for hypervisor hosts where access is abstracted and controlled, however the concern in the baremetal integration use case is the tenant ultimately has full superuser access to the machine.

...

...
I suspect if cloud administrators want to block such hardware access, vendors will want to support such a capability.

Devices can and usually do offer separate mechanisms for reading from registers, writing to them, updating flash etc. each with associated access permissions. A device vendor can go a bit extra by requiring specific Linux capabilities, such as say CAP_IPC_LOCK for mmap access, in their device driver.

Going back to the prior point for a Metal to Tenant case, these may be true for pure users of a shared system, but with the operating model of bare metal as a service, the user has full machine access. The user could also deploy an OS where capabilities checking is disabled entirely.

...

...
Blocking such access inherently forces some actions into hardware management/maintenance workflows, and may ultimately may cause some of a support matrix's use cases to be unsupportable, again ultimately depending on what exactly the user is attempting to achieve.

Not sure if you are expressing a concern here. If the admin is using device features or RBAC to restrict access, then she is intentionally blocking some combinations in your support matrix, right? Users in such a deployment need to live with that.

I was trying to further stress the prior concern and convey that I perceive the end result being a matrix of use cases where some are unsupportable. I completely agree that, in the end, the users would need to live with that situation. I just think that clarity will need to exist for users on what is possible, and what ultimately is not possible in various scenarios. -Julia

Nadathur, Sundar

12 Jan 12 Jan

1:41 p.m.

Hi Arkady and all, Good discussions and questions. First, it is good to clarify what we mean by lifecycle management. It includes: * Discovery: We need to get more than just the PCI IDs/addresses of devices. We would need their properties and features as well. This is especially the case for programmable devices, as the properties and changes can change over time, though the PCI ID may not. * Scheduling: We would want to schedule the application that needs offload based on the properties/features discovered above. * Programming/configuration and/or firmware update. More on this later. * Health management: discover the health of a device, esp. if programming/configuration etc. fail. * Inventory management: Track the fleet of accelerators based on their properties/features. * Other aspects that I won't dwell on here. In short, lifecycle management is more than just firmware update. Secondly, regarding the difference between programming and firmware updates, some key questions are: 1. What does the device configuration do? A. Expose properties/features relevant to scheduling: Could be for a specific application or workload (e.g. apply a new AI algorithm) Or expose new/premium device features (e.g. enable more memory banks) B. Update general features not relevant to scheduling. E.g. fix a bug in BMC firmware. 2. When/how is the device configuration done? A. Dynamically: as instances are provisioned/retired, based on time of day, workload demand, etc. This would be part of OpenStack workflow. B. Statically: as part of the physical host configuration. This is typically done 'offline', perhaps in a maintenance window, often using external frameworks like Redfish/Ansible/Puppet/Chef/... The combination 1A+2A is what I'd call programming, while 1B+2B is firmware update. I don't see a motivation for 1B+2A. The case 1A+2B is interesting. It basically means that a programmable device is being treated like a fixed-function accelerator for a period of time before it gets reprogrammed offline. This model is being used in the industry today, esp. telcos. I am fine with calling this a 'firmware update' too. There are some grey areas to consider. For example, many FPGA deployments are structured to have a 'shell', which is hardware logic that exposes some generic features like PCI and DMA, and a separate user/custom logic that is application/workload-specific. Would updating the shell qualify as 'programming' or a 'firmware update'? Today, it often falls under 2B, esp. if it requires server reboots. But it could conceivably come under 1A+2A as products and use cases evolve. IOW, what is called a firmware update today could become a programming update tomorrow. Cyborg is designed for programming, i.e. 1A+2A. It can be used with Nova (to program devices as instances are provisioned/retired) or standalone (based on time of day, traffic patterns, etc.) Other cases (1A/1B + 2B) can be classified as firmware update and outside of Cyborg. TL;DR * Agree with Arkady that firmware updates should follow the server vendors' guidelines, and can/should be done as part of the server configuration. * If the claim is that firmware updates, as defined above (i.e. 1A/1B + 2B), should be done by Ironic, I am fine with it. * To reiterate, it is NOT enough to handle devices based only on their PCI IDs -- we should be able to use their features/properties for scheduling, inventory management, etc. This is extra true for programmable devices where features can change dynamically while PCI IDs potentially stay constant. * Cyborg is designed for these devices and its stated role includes all other aspects of lifecycle management. * I see value in having Cyborg and Ironic work together, esp. for 1A+2B, where Ironic can do the 'firmware update' and Cyborg discovers the schedulable properties of the device.

...

From: Arkady.Kanevsky@dell.com <Arkady.Kanevsky@dell.com>

...

Sent: Friday, January 3, 2020 1:19 PM

...

To: openstack-discuss@lists.openstack.org

...

Subject: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

...

1. Application user need to program a portion of the device ...

Sure.

...

2. Administrator need to program the whole device for specific usage. That covers the scenario when device can only support single tenant or single use case.

Why does it have to be single-tenant or single use-case? For example, one could program an FPGA with an Open Vswitch implementation, which is shared by VMs from different tenants.

...

That is done once during OpenStack deployment but may need reprogramming to configure device for different usage.

If the change exposes workload-specific or schedulable properties, this would not necessarily be a one-shot thing at deployment time.

...

3. Administrator need to setup device for its use, like burning specific FW on it. This is typically done as part of server life-cycle event.

With the definition of firmware update as above, I agree.

...

The first 2 cases cover application life cycle of device usage.

Yes.

...

The last one covers device life cycle independently how it is used.

Here's where I beg to disagree. As I said, the term 'device lifecycle' is far broader than just firmware update.

...

Managing life cycle of devices is Ironic responsibility,

Disagree here. To the best of my knowledge, Ironic handles devices based on PCI IDs. Cyborg is designed to go deeper for discovering device features/properties and utilize Placement for scheduling based on these.

...

One cannot and should not manage lifecycle of server components independently.

If what you meant to say is: ' do not update device firmware independently of other server components', agreed.

...

Managing server devices outside server management violates customer service agreements with server vendors and breaks server support agreements.

Sure.

...

Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling

Hmm, this seems overly broad to me: not every deployment includes Ironic, and getting PCI IDs is not enough for scheduling and management.

...

Finally we want Cyborg to be able to be used in standalone capacity, say for Kubernetes

...

Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3

Use case 3 says "setup device for its use, like burning specific FW." With the definition of firmware above, I agree. Other aspects of lifecycle management, not covered by use cases 1 - 3, would come under Cyborg.

...

Thus, move all device Life-cycle code from Cyborg to Ironic

To recap, there is more to device lifecycle than firmware update. I'd suggest the other aspects can remain in Cyborg. Regards, Sundar

Dan Smith

13 Jan 13 Jan

7:16 a.m.

...

TL;DR

* Agree with Arkady that firmware updates should follow the server vendors' guidelines, and can/should be done as part of the server configuration.

I'm worried there's a little bit of confusion about "which nova" and "which ironic" in this case, especially since Arkady mentioned tripleo. More on that below. However, I agree that if you're using ironic to manage the nodes that form your actual (over)cloud, then having ironic update firmware on your accelerator device in the same way that it might update firmware on a regular NIC, GPU card, or anything else makes sense. However, if you're talking about services all at the same level (i.e. nova working with ironic to provide metal as a tenant as well as VMs) then *that* ironic is not going to be managing firmware on accelerators that you're handing to your VM instances on the compute nodes.

...

...
Managing life cycle of devices is Ironic responsibility,

Disagree here.

Me too, but in a general sense. I would not agree with the assessment that "Managing life cycle of devices is Ironic responsibility." Specifically the wide scope of "devices" being more than just physical machines. It's true that Ironic manages the lifecycle of physical machines, which may be used in a tripleo type of environment to manage the lifecycle of things like compute nodes. I *think* you both agree with that clarification, because of the next point, but I think it's important to avoid such statements that imply "all devices."

...

To the best of my knowledge, Ironic handles devices based on PCI IDs. Cyborg is designed to go deeper for discovering device features/properties and utilize Placement for scheduling based on these.

What does this matter though? If you're talking about firmware for an FPGA card, that's what you need to know in order to apply the correct firmware to it, independent of whatever application-level bitstream is going to go in there right?

...

...
One cannot and should not manage lifecycle of server components independently.

If what you meant to say is: ' do not update device firmware independently of other server components', agreed.

I'm not really sure what this original point from Arkady really means. Are (either of) you saying that if there's a CVE for the firmware in some card that the firmware patch shouldn't be applied without taking the box through a full lifecycle event or something? AFAIK, Ironic can't just do this in isolation, which means that if you've got a compute node managed by ironic in a tripleo type of environment, you're looking to move workloads away from that node, destroy it, apply updates, and re-create it before you can use it again. I guess I'd be surprised if people are doing this every time intel releases another microcode update. Am I wrong about that? Either way, I'm not sure how the firmware for accelerator cards is any different from the firmware for other devices on the system. Maybe the confusion is just that Cyborg does "programming" which seems similar to "updating firmware"?

...

...
Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling

Hmm, this seems overly broad to me: not every deployment includes Ironic, and getting PCI IDs is not enough for scheduling and management.

I also don't think it's correct. Nova does not get info about devices from Ironic, and I kinda doubt Neutron does either. If Nova is using ironic to provide metal as tenants, then...sure, but in the case where nova is providing VMs with accelerator cards, Ironic is not involved.

...

...
Thus, I propose that Cyborg cover use cases 1 & 2, and Ironic would cover use case 3

Use case 3 says "setup device for its use, like burning specific FW." With the definition of firmware above, I agree. Other aspects of lifecycle management, not covered by use cases 1 - 3, would come under Cyborg.

...
Thus, move all device Life-cycle code from Cyborg to Ironic

To recap, there is more to device lifecycle than firmware update. I'd suggest the other aspects can remain in Cyborg.

Didn't you say that firmware programming (as defined here) is not something that Cyborg currently does? Thus, nothing Cyborg currently does should be moved to Ironic, AFAICT. If that is true, then I agree. I guess my summary is: firmware updates for accelerators can and should be handled the same as for other devices on the system, in whatever way the operator currently does that. Programming an application-level bitstream should not be confused with the former activity, and is fully within the domain of Cyborg's responsibilities. --Dan

Jeremy Stanley

8:53 a.m.

On 2020-01-13 07:16:30 -0800 (-0800), Dan Smith wrote: [...]

...

What does this matter though? If you're talking about firmware for an FPGA card, that's what you need to know in order to apply the correct firmware to it, independent of whatever application-level bitstream is going to go in there right? [...] Either way, I'm not sure how the firmware for accelerator cards is any different from the firmware for other devices on the system. Maybe the confusion is just that Cyborg does "programming" which seems similar to "updating firmware"? [...]

FPGA configuration is a compiled binary blob written into non-volatile memory through a hardware interface. These similarities to firmware also result in many people actually calling it "firmware" even though, you're right, technically it's a mapping of gate interconnections and not really firmware in the conventional sense. In retrospect maybe I shouldn't have brought it up. I wouldn't be surprised, though, if there *are* NFV-related cases where the users of the virtual machines into which some network hardware is mapped need access to alter parts of, say, an interface controller's firmware. The Linux kernel has for years incorporated features to write or rewrite firmware and other microcode for certain devices at boot time for similar reasons, after all. -- Jeremy Stanley

Dan Smith

9:58 a.m.

...

FPGA configuration is a compiled binary blob written into non-volatile memory through a hardware interface. These similarities to firmware also result in many people actually calling it "firmware" even though, you're right, technically it's a mapping of gate interconnections and not really firmware in the conventional sense. In retrospect maybe I shouldn't have brought it up.

It's a super easy thing to conflate those two topics I think. Probably calling one the "firmware" and the other the "bitstream" is the most common distinction I've heard. The latter also potentially being the "application" or "function."

...

I wouldn't be surprised, though, if there *are* NFV-related cases where the users of the virtual machines into which some network hardware is mapped need access to alter parts of, say, an interface controller's firmware. The Linux kernel has for years incorporated features to write or rewrite firmware and other microcode for certain devices at boot time for similar reasons, after all.

Yeah, I'm not sure because I don't have a lot of experience with these devices. I guess I kinda expected that they have effectively two devices on each card: one being the FPGA itself and the other being just a management device that lets you flash the FPGA. If the FPGA is connected to the bus as well, I'd expect it to be able to define its own interaction (i.e. be like a NIC or be like a compression accelerator), and the actual "firmware" being purely a function of the management device. Either way, I think my point is that ironic's ability to manage the firmware part regardless of how often you need it to change is limited (currently, AFAIK) to the cleaning/prep phase of the lifecycle, and only really applies anyway if a compute node when it is a workload on top of the undercloud. For people that don't use ironic to provision their compute nodes, ironic wouldn't even have the opportunity to manage the firmware of those devices. I'm not saying Cyborg should fill the firmware gap, just not saying we should expect that Ironic will. --Dan

Nadathur, Sundar

10:26 a.m.

...

-----Original Message----- From: Jeremy Stanley <fungi@yuggoth.org> Sent: Monday, January 13, 2020 8:54 AM To: openstack-discuss@lists.openstack.org Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

On 2020-01-13 07:16:30 -0800 (-0800), Dan Smith wrote: [...]

...
What does this matter though? If you're talking about firmware for an FPGA card, that's what you need to know in order to apply the correct firmware to it, independent of whatever application-level bitstream is going to go in there right? [...] Either way, I'm not sure how the firmware for accelerator cards is any different from the firmware for other devices on the system. Maybe the confusion is just that Cyborg does "programming" which seems similar to "updating firmware"? [...]

FPGA configuration is a compiled binary blob written into non-volatile memory through a hardware interface. These similarities to firmware also result in many people actually calling it "firmware" even though, you're right, technically it's a mapping of gate interconnections and not really firmware in the conventional sense.

...

I wouldn't be surprised, though, if there *are* NFV-related cases where the users of the virtual machines into which some network hardware is mapped need access to alter parts of, say, an interface controller's firmware. The Linux kernel has for years incorporated features to write or rewrite firmware and other microcode for certain devices at boot time for similar reasons, after all.

This aspect does come up for discussion a lot. Generally, operators and device vendors get alarmed at the prospect of letting a user/VNF/instance program an image/bitstream into a device directly -- we wouldn't know what image it is, etc. Cyborg doesn't support that. But Cyborg could program an image/bitstream on behalf of the user/VNF. That said, the VNF or VM (in a non-networking context) can configure a device by reading from registers/DDR on the card or writing to them. They can be handled using standard access permissions, Linux capabilities, etc. For example, the VM may memory-map a region of the device's address space using the mmap system call, and that access can be controlled.

...

-- Jeremy Stanley

Regards, Sundar

Sean Mooney

10:53 a.m.

...

...
-----Original Message----- From: Jeremy Stanley <fungi@yuggoth.org> Sent: Monday, January 13, 2020 8:54 AM To: openstack-discuss@lists.openstack.org Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

On 2020-01-13 07:16:30 -0800 (-0800), Dan Smith wrote: [...]

...
What does this matter though? If you're talking about firmware for an FPGA card, that's what you need to know in order to apply the correct firmware to it, independent of whatever application-level bitstream is going to go in there right?

[...]

...
Either way, I'm not sure how the firmware for accelerator cards is any different from the firmware for other devices on the system. Maybe the confusion is just that Cyborg does "programming" which seems similar to "updating firmware"?

[...]

FPGA configuration is a compiled binary blob written into non-volatile memory through a hardware interface. These similarities to firmware also result in many people actually calling it "firmware" even though, you're right, technically it's a mapping of gate interconnections and not really firmware in the conventional sense.

+1

...
I wouldn't be surprised, though, if there *are* NFV-related cases where the users of the virtual machines into which some network hardware is mapped need access to alter parts of, say, an interface controller's firmware. The Linux kernel has for years incorporated features to write or rewrite firmware and other microcode for certain devices at boot time for similar reasons, after all.

This aspect does come up for discussion a lot. Generally, operators and device vendors get alarmed at the prospect of letting a user/VNF/instance program an image/bitstream into a device directly -- we wouldn't know what image it is, etc. Cyborg doesn't support that. But Cyborg could program an image/bitstream on behalf of the user/VNF. to be fair if you device support reprogramming over pcie then you can enable the guest to reprogram the device using nova's pci passthough feature by passing through the entire pf. cyborgs role is to provide a magaged acclerator not an unmanaged one. if we wanted to use use pre programed fpga or fix function acclerator you have been able to do that with

On Mon, 2020-01-13 at 18:26 +0000, Nadathur, Sundar wrote: pci passtough for the better part of 4 years. so i would consider unmanaged acclerator out of scope of cyborg at least until the integration of managed accllerator is done. nova already handelds vGPU, vPMEM(persistent memeory), generic pci passthough, sriov for neutron ports and hardware offloaded ovs VF(e.g. smart nic integration). cyborgs add value is in managing things nova cannot provide easily. arguing that ironic shoudl mangage fpga bitstream becasue it can manage firmware from a nova point of view is arguaing the virt driver should manage all devices that are provide to the guest meaning in the libvirt case it and not cyborg shoudl be continuted to be extended to mange fpgas and any other devices directly. we coudl do that but that would leave only one thing for cyborge to manage which woudl be remote acclartor that could be proved to instnace over a network fabric. making it a kind of cinder of acclerators. that is a usecase that nova and ironic both woudl be ill sutied for but it is not the dirction the cyborg project has moved in so unless you are suggesing cyborg should piviot i dont think we should redesign the interaction between nova ironic cyborg and neutron to have ironci manage the devices. i do think there is merrit in some integration between the ironic python agent and cyborg for discovery and perhaps programing of the fpga on an ironic node assuming the actual discovery and programing logic live in cyborg and ironic simply runs/deploys/configures the cyborg agent in the ipa image or invokes the cyborg code directly.

...

That said, the VNF or VM (in a non-networking context) can configure a device by reading from registers/DDR on the card or writing to them. They can be handled using standard access permissions, Linux capabilities, etc. For example, the VM may memory-map a region of the device's address space using the mmap system call, and that access can be controlled.

...
-- Jeremy Stanley

Regards, Sundar

Julia Kreger

22 Jan 22 Jan

8:36 a.m.

On Mon, Jan 13, 2020 at 10:58 AM Sean Mooney <smooney@redhat.com> wrote:

...

On Mon, 2020-01-13 at 18:26 +0000, Nadathur, Sundar wrote:

[trim]

...

...
...
I wouldn't be surprised, though, if there *are* NFV-related cases where the users of the virtual machines into which some network hardware is mapped need access to alter parts of, say, an interface controller's firmware. The Linux kernel has for years incorporated features to write or rewrite firmware and other microcode for certain devices at boot time for similar reasons, after all.

This aspect does come up for discussion a lot. Generally, operators and device vendors get alarmed at the prospect of letting a user/VNF/instance program an image/bitstream into a device directly -- we wouldn't know what image it is, etc. Cyborg doesn't support that. But Cyborg could program an image/bitstream on behalf of the user/VNF.

to be fair if you device support reprogramming over pcie then you can enable the guest to reprogram the device using nova's pci passthough feature by passing through the entire pf. cyborgs role is to provide a magaged acclerator not an unmanaged one. if we wanted to use use pre programed fpga or fix function acclerator you have been able to do that with pci passtough for the better part of 4 years. so i would consider unmanaged acclerator out of scope of cyborg at least until the integration of managed accllerator is done.

nova already handelds vGPU, vPMEM(persistent memeory), generic pci passthough, sriov for neutron ports and hardware offloaded ovs VF(e.g. smart nic integration).

cyborgs add value is in managing things nova cannot provide easily.

arguing that ironic shoudl mangage fpga bitstream becasue it can manage firmware from a nova point of view is arguaing the virt driver should manage all devices that are provide to the guest meaning in the libvirt case it and not cyborg shoudl be continuted to be extended to mange fpgas and any other devices directly.

I _feel_ like there would eventually be edge cases where it may be desired or required, but without a practical bare metal as a service integration to start with, it seems kind of crazy to think about it too much.

...

we coudl do that but that would leave only one thing for cyborge to manage which woudl be remote acclartor that could be proved to instnace over a network fabric. making it a kind of cinder of acclerators. that is a usecase that nova and ironic both woudl be ill sutied for but it is not the dirction the cyborg project has moved in so unless you are suggesing cyborg should piviot i dont think we should redesign the interaction between nova ironic cyborg and neutron to have ironci manage the devices.

I concur, I think the overall concern that started the discussion was still how as a vendor are these things supported and warranties are not inadvertently voided. From some discussions, I feel like the "As a cloud user I want a managed accelerator" is distinctly different from "As a cloud user I want baremetal" and still different from "As a cloud installer, I want to install my infrastructure". No one configuration, software, or use pattern will solve all of the cases, at least until AIs are writing our code for us and the installation AI can read/understand the OEM's build sheet to understand what was done at the factory.

...

i do think there is merrit in some integration between the ironic python agent and cyborg for discovery and perhaps programing of the fpga on an ironic node assuming the actual discovery and programing logic live in cyborg and ironic simply runs/deploys/configures the cyborg agent in the ipa image or invokes the cyborg code directly.

I absolutely agree, and I suspect from a practical operational standpoint, it would be good to at least offer a flag of "Hey, delete any bitstreams" between tenant deployments. The one conundrum is the mechanics of triggering and running a cyborg agent because these actions are typically performed on an isolated, restricted access network without actual access much less credentials to the message bus. Of course, likely solvable.

...

...
That said, the VNF or VM (in a non-networking context) can configure a device by reading from registers/DDR on the card or writing to them. They can be handled using standard access permissions, Linux capabilities, etc. For example, the VM may memory-map a region of the device's address space using the mmap system call, and that access can be controlled.

...
-- Jeremy Stanley

Regards, Sundar

Nadathur, Sundar

13 Jan 13 Jan

10:16 a.m.

...

From: Dan Smith <dms@danplanet.com> Sent: Monday, January 13, 2020 7:17 AM To: Nadathur, Sundar <sundar.nadathur@intel.com> Cc: Arkady.Kanevsky@dell.com; openstack-discuss@lists.openstack.org Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

...
TL;DR

* Agree with Arkady that firmware updates should follow the server vendors' guidelines, and can/should be done as part of the server configuration.

I'm worried there's a little bit of confusion about "which nova" and "which ironic" in this case, especially since Arkady mentioned tripleo. More on that below. However, I agree that if you're using ironic to manage the nodes that form your actual (over)cloud, then having ironic update firmware on your accelerator device in the same way that it might update firmware on a regular NIC, GPU card, or anything else makes sense.

However, if you're talking about services all at the same level (i.e. nova working with ironic to provide metal as a tenant as well as VMs) then *that* ironic is not going to be managing firmware on accelerators that you're handing to your VM instances on the compute nodes.

This goes back to the definition of firmware update vs. programming in my earlier post. In a Nova + Ironic + Cyborg env, I'd expect Cyborg to do programming. Firmware updates can be done by Ironic, Ansible/Redfish/... , some combination like Ironic with Redfish driver, or whatever the operator chooses.

...

...
To the best of my knowledge, Ironic handles devices based on PCI IDs. Cyborg is designed to go deeper for discovering device features/properties and utilize Placement for scheduling based on these.

What does this matter though? If you're talking about firmware for an FPGA card, that's what you need to know in order to apply the correct firmware to it, independent of whatever application-level bitstream is going to go in there right?

The device properties are needed for scheduling: users are often interested in getting a VM with an accelerator that has specific properties: e.g. implements a specific version of gzip, has 4 GB or more of device-local memory etc. Device properties are also needed for management of accelerator inventory: admins want to know how many FPGAs have a particular bitstream burnt into them, etc. Re. programming, sometimes we may need to determine what's in a device (beyond PCI ID) before programming it to ensure the image being programmed and the existing device contents are compatible.

...

...
...
One cannot and should not manage lifecycle of server components independently.

If what you meant to say is: ' do not update device firmware independently of other server components', agreed.

I'm not really sure what this original point from Arkady really means. Are (either of) you saying that if there's a CVE for the firmware in some card that the firmware patch shouldn't be applied without taking the box through a full lifecycle event or something? My paraphrase of Arkady's points: a. Updating CPU firmware/microcode should be done as per the server/CPU vendor's rules (use their specific tools, or some specific mechanisms like Redfish, with auditing, ....) b. Updating firmware for devices/accelerators should be done the same way.

By a "full lifecycle event", you presumably mean vacating the entire node. For device updates, that is not always needed: one could disconnect just the instances using that device. The server/device vendor rules must specify the 'lifecycle event' involved for a specific update.

...

AFAIK, Ironic can't just do this in isolation, which means that if you've got a compute node managed by ironic in a tripleo type of environment, you're looking to move workloads away from that node, destroy it, apply updates, and re-create it before you can use it again. I guess I'd be surprised if people are doing this every time intel releases another microcode update. Am I wrong about that?

Not making any official statements but, generally, if a microcode/firmware update requires a reboot, one would have to do that. The admin would declare a maintenance window and combine software/firmware/configuration updates in that window.

...

Either way, I'm not sure how the firmware for accelerator cards is any different from the firmware for other devices on the system.

Updates of other devices, like CPU or motherboard components, often require server reboots. Accelerator updates may or may not require them, depending on ... all kinds of things.

...

Maybe the confusion is just that Cyborg does "programming" which seems similar to "updating firmware"?

Yes, indeed. That is why I went at length on the distinction between the two.

...

...
...
Nova and Neutron are getting info about all devices and their capabilities from Ironic; that they use for scheduling

Hmm, this seems overly broad to me: not every deployment includes Ironic, and getting PCI IDs is not enough for scheduling and management.

I also don't think it's correct. Nova does not get info about devices from Ironic, and I kinda doubt Neutron does either. If Nova is using ironic to provide metal as tenants, then...sure, but in the case where nova is providing VMs with accelerator cards, Ironic is not involved.

...

...
...
Thus, move all device Life-cycle code from Cyborg to Ironic

To recap, there is more to device lifecycle than firmware update. I'd suggest the other aspects can remain in Cyborg.

Didn't you say that firmware programming (as defined here) is not something that Cyborg currently does? Thus, nothing Cyborg currently does should be moved to Ironic, AFAICT. If that is true, then I agree.

Yes ^.

...

I guess my summary is: firmware updates for accelerators can and should be handled the same as for other devices on the system, in whatever way the operator currently does that. Programming an application-level bitstream should not be confused with the former activity, and is fully within the domain of Cyborg's responsibilities.

Agreed.

...

--Dan

Regards, Sundar

Dan Smith

11:15 a.m.

...

This goes back to the definition of firmware update vs. programming in my earlier post. In a Nova + Ironic + Cyborg env, I'd expect Cyborg to do programming. Firmware updates can be done by Ironic, Ansible/Redfish/... , some combination like Ironic with Redfish driver, or whatever the operator chooses.

Yes, this is my point. I think we're in agreement here.

...

...
What does this matter though? If you're talking about firmware for an FPGA card, that's what you need to know in order to apply the correct firmware to it, independent of whatever application-level bitstream is going to go in there right?

The device properties are needed for scheduling: users are often interested in getting a VM with an accelerator that has specific properties: e.g. implements a specific version of gzip, has 4 GB or more of device-local memory etc.

Right, I'm saying I don't think Ironic needs to know anything other than the PCI ID of a card in order to update its firmware, correct? You and I are definitely in agreement that Ironic should have nothing to do with _programming_ and thus nothing to do with _scheduling_ of workloads (affined-) to accelerators.

...

By a "full lifecycle event", you presumably mean vacating the entire node. For device updates, that is not always needed: one could disconnect just the instances using that device. The server/device vendor rules must specify the 'lifecycle event' involved for a specific update.

Right, I'm saying that today (AFAIK) Ironic can only do the "vacate, destroy, clean, re-image" sort of lifecycle, which is very heavyweight to just update firmware on a card.

...

Updates of other devices, like CPU or motherboard components, often require server reboots. Accelerator updates may or may not require them, depending on ... all kinds of things.

Yep, all of this is lighter-weight than Ironic destroying, cleaning, and re-imaging a node. I'm making the case for "sure, Ironic could do the firmware update if it's cleaning a node, but in most cases you probably want a more lightweight process like ansible and a reboot." So again, I think we're in full agreement on the classification of operation, and the subset of that which is wholly owned by Cyborg, as well as what of that *may* be owned by Ironic or any other hardware management tool. --Dan

1991

Age (days ago)

2010

Last active (days ago)

List overview

Download

18 comments

7 participants

participants (7)

Arkady.Kanevsky＠dell.com
Dan Smith
Jeremy Stanley
Julia Kreger
Nadathur, Sundar
Sean Mooney
Zhipeng Huang