On 2020-12-16 11:25:07 -0600 (-0600), Eric K. Miller wrote: [...]
It seems that, instead of detecting tainted "anything", it would be better to assume zero trust in the hardware after use, and instead reset/re-flash everything upon re-provisioning. I can understand that re-flashing can be hard on the flash, but now that most (all?) firmware has digital signature checks, this can be used to avoid re-flashing when the signature matches.
I too raised this in one discussion. The organizations involved see it as an incremental approach, one which allows them to forego any automated recovery process for now on the assumption that incidence of this problem will be extremely infrequent. Instead they can bill the customer for the cost of manually recovering the machine to a clean state, or even simply charge them for the hardware itself and not bother with recovery at all. "You break it, you buy it."
However, the issue still remains that typical server hardware (I need to check OpenCompute's hardware) requires jumpers to be changed for re-flashing/resetting configs, which is a real pain. So, even if you did detect something bad, this needs to be done to fix the issue.
This article suggests OCP wants to tackle it via firmware authentication both when it's called and also when it's being rewritten: https://www.datacenterknowledge.com/security/open-compute-project-releases-h... But that aside, if you wire those "jumpers" back to a central header for some group of machines, you can in theory just do something like this to inexpensively remote control banks of them over your isolated management network: https://elinux.org/RPi_GPIO_Interface_Circuits#Using_an_NPN_transistor -- Jeremy Stanley