On Mon, Jan 13, 2020 at 10:58 AM Sean Mooney <smooney@redhat.com> wrote:
On Mon, 2020-01-13 at 18:26 +0000, Nadathur, Sundar wrote:
[trim]
I wouldn't be surprised, though, if there *are* NFV-related cases where the users of the virtual machines into which some network hardware is mapped need access to alter parts of, say, an interface controller's firmware. The Linux kernel has for years incorporated features to write or rewrite firmware and other microcode for certain devices at boot time for similar reasons, after all.
This aspect does come up for discussion a lot. Generally, operators and device vendors get alarmed at the prospect of letting a user/VNF/instance program an image/bitstream into a device directly -- we wouldn't know what image it is, etc. Cyborg doesn't support that. But Cyborg could program an image/bitstream on behalf of the user/VNF.
to be fair if you device support reprogramming over pcie then you can enable the guest to reprogram the device using nova's pci passthough feature by passing through the entire pf. cyborgs role is to provide a magaged acclerator not an unmanaged one. if we wanted to use use pre programed fpga or fix function acclerator you have been able to do that with pci passtough for the better part of 4 years. so i would consider unmanaged acclerator out of scope of cyborg at least until the integration of managed accllerator is done.
nova already handelds vGPU, vPMEM(persistent memeory), generic pci passthough, sriov for neutron ports and hardware offloaded ovs VF(e.g. smart nic integration).
cyborgs add value is in managing things nova cannot provide easily.
arguing that ironic shoudl mangage fpga bitstream becasue it can manage firmware from a nova point of view is arguaing the virt driver should manage all devices that are provide to the guest meaning in the libvirt case it and not cyborg shoudl be continuted to be extended to mange fpgas and any other devices directly.
I _feel_ like there would eventually be edge cases where it may be desired or required, but without a practical bare metal as a service integration to start with, it seems kind of crazy to think about it too much.
we coudl do that but that would leave only one thing for cyborge to manage which woudl be remote acclartor that could be proved to instnace over a network fabric. making it a kind of cinder of acclerators. that is a usecase that nova and ironic both woudl be ill sutied for but it is not the dirction the cyborg project has moved in so unless you are suggesing cyborg should piviot i dont think we should redesign the interaction between nova ironic cyborg and neutron to have ironci manage the devices.
I concur, I think the overall concern that started the discussion was still how as a vendor are these things supported and warranties are not inadvertently voided. From some discussions, I feel like the "As a cloud user I want a managed accelerator" is distinctly different from "As a cloud user I want baremetal" and still different from "As a cloud installer, I want to install my infrastructure". No one configuration, software, or use pattern will solve all of the cases, at least until AIs are writing our code for us and the installation AI can read/understand the OEM's build sheet to understand what was done at the factory.
i do think there is merrit in some integration between the ironic python agent and cyborg for discovery and perhaps programing of the fpga on an ironic node assuming the actual discovery and programing logic live in cyborg and ironic simply runs/deploys/configures the cyborg agent in the ipa image or invokes the cyborg code directly.
I absolutely agree, and I suspect from a practical operational standpoint, it would be good to at least offer a flag of "Hey, delete any bitstreams" between tenant deployments. The one conundrum is the mechanics of triggering and running a cyborg agent because these actions are typically performed on an isolated, restricted access network without actual access much less credentials to the message bus. Of course, likely solvable.
That said, the VNF or VM (in a non-networking context) can configure a device by reading from registers/DDR on the card or writing to them. They can be handled using standard access permissions, Linux capabilities, etc. For example, the VM may memory-map a region of the device's address space using the mmap system call, and that access can be controlled.
-- Jeremy Stanley
Regards, Sundar