[ironic] Securing physical hosts in hostile environments
Hi, We have considered ironic for deploying physical hosts for our public cloud platform, but have not found any way to properly secure the hosts, or rather, how to reset a physical host back to factory defaults between uses - such as BIOS and BMC settings. Since users (bad actors) can access the BMC via SMBus, reset BIOS password(s), change firmware versions, etc., there appears to be no proper way to secure a platform. This is especially true when resetting BIOS/BMC configurations since this typically involves shorting a jumper and power cycling a unit (physically removing power from the power supplies - not just a power down from the BMC). Manufacturers have not made this easy/possible, and we have yet to find a commercial device that can assist with this out-of-band. We have actually thought of building our own, but thought we would ask the community first. Thanks! Eric
Looks like I forgot to ask a question after my statements. :) What are others doing to secure their physical hosts in hostile environments? Eric
On 2020-12-15 17:43:20 -0600 (-0600), Eric K. Miller wrote: [...]
Since users (bad actors) can access the BMC via SMBus, reset BIOS password(s), change firmware versions, etc., there appears to be no proper way to secure a platform. [...] Manufacturers have not made this easy/possible, and we have yet to find a commercial device that can assist with this out-of-band. We have actually thought of building our own, but thought we would ask the community first.
My understanding is that one of the primary reasons why https://www.opencompute.org/ formed was to collaboratively design hardware which can't be compromised in-band by its users. The Elastic Secure Infrastructure effort happening in OpenInfra Labs is also attempting to template and document repeatable solutions for the first half of the problem (centrally detecting tainted BIOS/firmware via signature verification and attestation): https://www.bu.edu/rhcollab/projects/esi/ -- Jeremy Stanley
My understanding is that one of the primary reasons why https://www.opencompute.org/ formed was to collaboratively design hardware which can't be compromised in-band by its users. The Elastic Secure Infrastructure effort happening in OpenInfra Labs is also attempting to template and document repeatable solutions for the first half of the problem (centrally detecting tainted BIOS/firmware via signature verification and attestation): https://www.bu.edu/rhcollab/projects/esi/ -- Jeremy Stanley
Thanks Jeremy! I have some reading to do. It seems that, instead of detecting tainted "anything", it would be better to assume zero trust in the hardware after use, and instead reset/re-flash everything upon re-provisioning. I can understand that re-flashing can be hard on the flash, but now that most (all?) firmware has digital signature checks, this can be used to avoid re-flashing when the signature matches. However, the issue still remains that typical server hardware (I need to check OpenCompute's hardware) requires jumpers to be changed for re-flashing/resetting configs, which is a real pain. So, even if you did detect something bad, this needs to be done to fix the issue. Eric
On 2020-12-16 11:25:07 -0600 (-0600), Eric K. Miller wrote: [...]
It seems that, instead of detecting tainted "anything", it would be better to assume zero trust in the hardware after use, and instead reset/re-flash everything upon re-provisioning. I can understand that re-flashing can be hard on the flash, but now that most (all?) firmware has digital signature checks, this can be used to avoid re-flashing when the signature matches.
I too raised this in one discussion. The organizations involved see it as an incremental approach, one which allows them to forego any automated recovery process for now on the assumption that incidence of this problem will be extremely infrequent. Instead they can bill the customer for the cost of manually recovering the machine to a clean state, or even simply charge them for the hardware itself and not bother with recovery at all. "You break it, you buy it."
However, the issue still remains that typical server hardware (I need to check OpenCompute's hardware) requires jumpers to be changed for re-flashing/resetting configs, which is a real pain. So, even if you did detect something bad, this needs to be done to fix the issue.
This article suggests OCP wants to tackle it via firmware authentication both when it's called and also when it's being rewritten: https://www.datacenterknowledge.com/security/open-compute-project-releases-h... But that aside, if you wire those "jumpers" back to a central header for some group of machines, you can in theory just do something like this to inexpensively remote control banks of them over your isolated management network: https://elinux.org/RPi_GPIO_Interface_Circuits#Using_an_NPN_transistor -- Jeremy Stanley
I've attempted to secure physical hardware at a previous job. The primary tools we used were vendor relationships and extensive testing. There's no silver bullet to getting hardware safe against a "root" user. Not trying to give an unhelpful answer; but outside of the groups that Jeremy linked, there's been very little innovation enabling you to secure your hardware, unless you work directly with a vendor (and have the buying power to make them listen). - Jay Faulkner On Tue, Dec 15, 2020 at 3:48 PM Eric K. Miller <emiller@genesishosting.com> wrote:
Hi,
We have considered ironic for deploying physical hosts for our public cloud platform, but have not found any way to properly secure the hosts, or rather, how to reset a physical host back to factory defaults between uses - such as BIOS and BMC settings. Since users (bad actors) can access the BMC via SMBus, reset BIOS password(s), change firmware versions, etc., there appears to be no proper way to secure a platform.
This is especially true when resetting BIOS/BMC configurations since this typically involves shorting a jumper and power cycling a unit (physically removing power from the power supplies - not just a power down from the BMC). Manufacturers have not made this easy/possible, and we have yet to find a commercial device that can assist with this out-of-band. We have actually thought of building our own, but thought we would ask the community first.
Thanks!
Eric
I've attempted to secure physical hardware at a previous job. The primary tools we used were vendor relationships and extensive testing. There's no silver bullet to getting hardware safe against a "root" user.
Not trying to give an unhelpful answer; but outside of the groups that Jeremy linked, there's been very little innovation enabling you to secure your hardware, unless you work directly with a vendor (and have the buying power to make them listen). - Jay Faulkner
Thanks Jay! I suspected as much. It does seem that there is likely a big market for this - an out-of-band device/PCI card that can assist with initiating re-flashing, power management (outside of the switchable power supplies), and jumper changes. I was a bit shocked that it didn't exist. I thought SMC would have built something like this into their SuperBlade systems, but their chassis-level BMC reset functions simply use the network to connect to the blades' BMCs, which isn't too helpful when the user changes the IP address of the BMC… ugh. Eric
On Wed, Dec 16, 2020 at 9:33 AM Eric K. Miller <emiller@genesishosting.com> wrote:
I've attempted to secure physical hardware at a previous job. The primary tools we used were vendor relationships and extensive testing. There's no silver bullet to getting hardware safe against a "root" user.
Not trying to give an unhelpful answer; but outside of the groups that Jeremy linked, there's been very little innovation enabling you to secure your hardware, unless you work directly with a vendor (and have the buying power to make them listen). - Jay Faulkner
Thanks Jay! I suspected as much. It does seem that there is likely a big market for this - an out-of-band device/PCI card that can assist with initiating re-flashing, power management (outside of the switchable power supplies), and jumper changes. I was a bit shocked that it didn't exist. I thought SMC would have built something like this into their SuperBlade systems, but their chassis-level BMC reset functions simply use the network to connect to the blades' BMCs, which isn't too helpful when the user changes the IP address of the BMC… ugh.
Eric
I think in the SMC case, it is kind of designed that way to always trust the user. I think the IPMI inband interface can be disabled on some vendors' gear, which would definitely help. However in the SMC case, if memory serves to reset the bmc to factory default you do have to move the jumper, reset power, reset the bmc password via an in-operating system tool and reset addressing via the bios. :\
Some operators have taken an approach of attestation and system measurement as a means to try and combat these sorts of vectors, however, if the TPM can't read the firmware to "measure" checksum out of the inband firmware channel, i.e. access the flash directly, not what malicious byte code could reply to, then it is a little difficult to trust that mechanism. The positive is that this mainly means things like drives are the items at risk at this point. Not exactly comforting as the first firmware POC I can think of that spoofs on checking the firmware was against a SATA disk. I know some operators have brought up trying to drive their vendors into means of having an out of band mechanism to be able to check and assert these things, where in the meantime they are performing in-band flashing on upon each cleaning in hope to scrub malicious firmware in hopes of squashing any malicious user's actions. This is an approach a number of operators have publicly stated they've taken, however it requires creating your own custom hardware manager to align with the hardware you have and the firmware versions you want/expect. I think this is a good topic for the baremetal SIG to try and discuss and push forward, because as Jay said, there is no silver bullet, and most of these patterns are basically highly customized sorts of patterns and interactions based upon your environment, your hardware, and the attack vectors you're concerned about. -Julia On Wed, Dec 16, 2020 at 9:19 AM Jay Faulkner <jay.faulkner@verizonmedia.com> wrote:
I've attempted to secure physical hardware at a previous job. The primary tools we used were vendor relationships and extensive testing. There's no silver bullet to getting hardware safe against a "root" user.
Not trying to give an unhelpful answer; but outside of the groups that Jeremy linked, there's been very little innovation enabling you to secure your hardware, unless you work directly with a vendor (and have the buying power to make them listen).
- Jay Faulkner
On Tue, Dec 15, 2020 at 3:48 PM Eric K. Miller <emiller@genesishosting.com> wrote:
Hi,
We have considered ironic for deploying physical hosts for our public cloud platform, but have not found any way to properly secure the hosts, or rather, how to reset a physical host back to factory defaults between uses - such as BIOS and BMC settings. Since users (bad actors) can access the BMC via SMBus, reset BIOS password(s), change firmware versions, etc., there appears to be no proper way to secure a platform.
This is especially true when resetting BIOS/BMC configurations since this typically involves shorting a jumper and power cycling a unit (physically removing power from the power supplies - not just a power down from the BMC). Manufacturers have not made this easy/possible, and we have yet to find a commercial device that can assist with this out-of-band. We have actually thought of building our own, but thought we would ask the community first.
Thanks!
Eric
Some operators have taken an approach of attestation and system measurement as a means to try and combat these sorts of vectors, however, if the TPM can't read the firmware to "measure" checksum out of the inband firmware channel, i.e. access the flash directly, not what malicious byte code could reply to, then it is a little difficult to trust that mechanism. The positive is that this mainly means things like drives are the items at risk at this point. Not exactly comforting as the first firmware POC I can think of that spoofs on checking the firmware was against a SATA disk.
We thought about that too - potential firmware corruption of NVMe drives, or the configuration of drives that support NVMe namespaces, and undoing this upon reprovisioning of the server. Lots of things to think about. I'm not 100% sure how the firmware signature checks work, but it seems that this would be done within the firmware itself, and not with a separate management processor inside the device. So, then we have to deal with the potential firmware flash of an older firmware version that did not have digital signature checks, which would open a channel to install anything the attacker wanted on that device.
I know some operators have brought up trying to drive their vendors into means of having an out of band mechanism to be able to check and assert these things, where in the meantime they are performing in-band flashing on upon each cleaning in hope to scrub malicious firmware in hopes of squashing any malicious user's actions. This is an approach a number of operators have publicly stated they've taken, however it requires creating your own custom hardware manager to align with the hardware you have and the firmware versions you want/expect.
Exactly - so quite an effort, and labor intensive.
I think this is a good topic for the baremetal SIG to try and discuss and push forward, because as Jay said, there is no silver bullet, and most of these patterns are basically highly customized sorts of patterns and interactions based upon your environment, your hardware, and the attack vectors you're concerned about.
I think the answer is to keep the hardware as simple as possible - meaning no internal drives or other cards that could be modified. It would actually be nice if machines had a "loadable BIOS firmware" from external media, where everytime the machine booted, the BIOS firmware would load from a trusted source (a locally attached drive - directly to the BIOS chip) - and maybe the same for BMC firmware. BIOS firmware already loads a shadow copy of the BIOS into memory already - why not just load it from external media instead somehow. Somewhat like UEFI firmware provides for BIOS configuration data. This strategy leaves the hardware in a "bare" state with no software, so resetting the device would always return to a clean state. I'll have to look for the baremetal SIG and participate. Thanks for pointing it out! Eric
On 2020-12-16 09:33:13 -0800 (-0800), Julia Kreger wrote: [...]
in the meantime they are performing in-band flashing on upon each cleaning in hope to scrub malicious firmware in hopes of squashing any malicious user's actions. This is an approach a number of operators have publicly stated they've taken, however it requires creating your own custom hardware manager to align with the hardware you have and the firmware versions you want/expect. [...]
It's also worth reminding everyone this is an incomplete solution. How do you know the in-band reflashing worked? Because the (possibly backdoored) firmware says it did, of course! It's certainly not going to just claim to have reflashed with exactly the bits you supplied while actually reinjecting its persistent backdoor, right? Of course, that's ultimately the reason we keep having this conversation over and over. ;) -- Jeremy Stanley
participants (4)
-
Eric K. Miller
-
Jay Faulkner
-
Jeremy Stanley
-
Julia Kreger