[Cyborg][Ironic][Nova][Neutron][TripleO][Cinder] accelerators management

Dan Smith dms at danplanet.com
Mon Jan 13 19:15:23 UTC 2020


> This goes back to the definition of firmware update vs. programming in
> my earlier post. In a Nova + Ironic + Cyborg env, I'd expect Cyborg to
> do programming. Firmware updates can be done by Ironic,
> Ansible/Redfish/... , some combination like Ironic with Redfish
> driver, or whatever the operator chooses.

Yes, this is my point. I think we're in agreement here.

>> What does this matter though? If you're talking about firmware for an FPGA
>> card, that's what you need to know in order to apply the correct firmware to
>> it, independent of whatever application-level bitstream is going to go in there
>> right?
>
> The device properties are needed for scheduling: users are often
> interested in getting a VM with an accelerator that has specific
> properties: e.g. implements a specific version of gzip, has 4 GB or
> more of device-local memory etc.

Right, I'm saying I don't think Ironic needs to know anything other than
the PCI ID of a card in order to update its firmware, correct? You and I
are definitely in agreement that Ironic should have nothing to do with
_programming_ and thus nothing to do with _scheduling_ of workloads
(affined-) to accelerators.

> By a "full lifecycle event", you presumably mean vacating the entire
> node. For device updates, that is not always needed: one could
> disconnect just the instances using that device. The server/device
> vendor rules must specify the 'lifecycle event' involved for a
> specific update.

Right, I'm saying that today (AFAIK) Ironic can only do the "vacate,
destroy, clean, re-image" sort of lifecycle, which is very heavyweight
to just update firmware on a card.

> Updates of other devices, like CPU or motherboard components, often
> require server reboots. Accelerator updates may or may not require
> them, depending on ... all kinds of things.

Yep, all of this is lighter-weight than Ironic destroying, cleaning, and
re-imaging a node. I'm making the case for "sure, Ironic could do the
firmware update if it's cleaning a node, but in most cases you probably
want a more lightweight process like ansible and a reboot."

So again, I think we're in full agreement on the classification of
operation, and the subset of that which is wholly owned by Cyborg, as
well as what of that *may* be owned by Ironic or any other hardware
management tool.

--Dan



More information about the openstack-discuss mailing list