-----Original Message----- From: Jeremy Stanley <fungi@yuggoth.org> Sent: Monday, January 13, 2020 8:54 AM To: openstack-discuss@lists.openstack.org Subject: Re: [Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management
On 2020-01-13 07:16:30 -0800 (-0800), Dan Smith wrote: [...]
What does this matter though? If you're talking about firmware for an FPGA card, that's what you need to know in order to apply the correct firmware to it, independent of whatever application-level bitstream is going to go in there right?
[...]
Either way, I'm not sure how the firmware for accelerator cards is any different from the firmware for other devices on the system. Maybe the confusion is just that Cyborg does "programming" which seems similar to "updating firmware"?
[...]
FPGA configuration is a compiled binary blob written into non-volatile memory through a hardware interface. These similarities to firmware also result in many people actually calling it "firmware" even though, you're right, technically it's a mapping of gate interconnections and not really firmware in the conventional sense.
+1
I wouldn't be surprised, though, if there *are* NFV-related cases where the users of the virtual machines into which some network hardware is mapped need access to alter parts of, say, an interface controller's firmware. The Linux kernel has for years incorporated features to write or rewrite firmware and other microcode for certain devices at boot time for similar reasons, after all.
This aspect does come up for discussion a lot. Generally, operators and device vendors get alarmed at the prospect of letting a user/VNF/instance program an image/bitstream into a device directly -- we wouldn't know what image it is, etc. Cyborg doesn't support that. But Cyborg could program an image/bitstream on behalf of the user/VNF. to be fair if you device support reprogramming over pcie then you can enable the guest to reprogram the device using nova's pci passthough feature by passing through the entire pf. cyborgs role is to provide a magaged acclerator not an unmanaged one. if we wanted to use use pre programed fpga or fix function acclerator you have been able to do that with
On Mon, 2020-01-13 at 18:26 +0000, Nadathur, Sundar wrote: pci passtough for the better part of 4 years. so i would consider unmanaged acclerator out of scope of cyborg at least until the integration of managed accllerator is done. nova already handelds vGPU, vPMEM(persistent memeory), generic pci passthough, sriov for neutron ports and hardware offloaded ovs VF(e.g. smart nic integration). cyborgs add value is in managing things nova cannot provide easily. arguing that ironic shoudl mangage fpga bitstream becasue it can manage firmware from a nova point of view is arguaing the virt driver should manage all devices that are provide to the guest meaning in the libvirt case it and not cyborg shoudl be continuted to be extended to mange fpgas and any other devices directly. we coudl do that but that would leave only one thing for cyborge to manage which woudl be remote acclartor that could be proved to instnace over a network fabric. making it a kind of cinder of acclerators. that is a usecase that nova and ironic both woudl be ill sutied for but it is not the dirction the cyborg project has moved in so unless you are suggesing cyborg should piviot i dont think we should redesign the interaction between nova ironic cyborg and neutron to have ironci manage the devices. i do think there is merrit in some integration between the ironic python agent and cyborg for discovery and perhaps programing of the fpga on an ironic node assuming the actual discovery and programing logic live in cyborg and ironic simply runs/deploys/configures the cyborg agent in the ipa image or invokes the cyborg code directly.
That said, the VNF or VM (in a non-networking context) can configure a device by reading from registers/DDR on the card or writing to them. They can be handled using standard access permissions, Linux capabilities, etc. For example, the VM may memory-map a region of the device's address space using the mmap system call, and that access can be controlled.
-- Jeremy Stanley
Regards, Sundar