Circling back to this now since I'm not in meetings and can actually think about this topic. :) On Sun, Jan 12, 2020 at 1:42 PM Nadathur, Sundar <sundar.nadathur@intel.com> wrote:
[trim]
Further complicating matters is the "Metal to Tenant" use cases where the user requesting the machine is not an administrator, but has some level of inherent administrative access to all Operating System accessible devices once their OS has booted. Which makes me wonder "What if the cloud administrators WANT to block the tenant's direct ability to write/flash firmware into accelerator/smartnic/etc?"
Yes, admins may want to do that. This can be done (partly) via RBAC, by having different roles for tenants who can use devices but not reprogram them, and for tenants who can program the device with application/scheduling-relevant features (but not firmware), etc.
I concur that it might be able to do by RBAC for hypervisor hosts where access is abstracted and controlled, however the concern in the baremetal integration use case is the tenant ultimately has full superuser access to the machine.
I suspect if cloud administrators want to block such hardware access, vendors will want to support such a capability.
Devices can and usually do offer separate mechanisms for reading from registers, writing to them, updating flash etc. each with associated access permissions. A device vendor can go a bit extra by requiring specific Linux capabilities, such as say CAP_IPC_LOCK for mmap access, in their device driver.
Going back to the prior point for a Metal to Tenant case, these may be true for pure users of a shared system, but with the operating model of bare metal as a service, the user has full machine access. The user could also deploy an OS where capabilities checking is disabled entirely.
Blocking such access inherently forces some actions into hardware management/maintenance workflows, and may ultimately may cause some of a support matrix's use cases to be unsupportable, again ultimately depending on what exactly the user is attempting to achieve.
Not sure if you are expressing a concern here. If the admin is using device features or RBAC to restrict access, then she is intentionally blocking some combinations in your support matrix, right? Users in such a deployment need to live with that.
I was trying to further stress the prior concern and convey that I perceive the end result being a matrix of use cases where some are unsupportable. I completely agree that, in the end, the users would need to live with that situation. I just think that clarity will need to exist for users on what is possible, and what ultimately is not possible in various scenarios. -Julia