[nova][dev] vCPU Pinning for L1/L2 cache side-channel vulnerability mitigation
rob at cleansafecloud.com
Thu Jan 10 15:49:14 UTC 2019
Hello Nova folks,
I spoke to some of you very briefly about this in Berlin (thanks again for your time), and we were resigned to turning off SMT to fully protect against future CPU cache side-channel attacks as I know many others have done. However, we have stubbornly done a bit of last-resort research and testing into using vCPU pinning on a per-tenant basis as an alternative and I’d like to lay it out in more detail for you to make sure there are no legs in the idea before abandoning it completely.
The idea is to use libvirt’s vcpupin ability to ensure that two different tenants never share the same physical CPU core, so they cannot theoretically steal each other’s data via an L1 or L2 cache side-channel. The pinning would be optimised to make use of as many logical cores as possible for any given tenant. We would also isolate other key system processes to a separate range of physical cores. After discussions in Berlin, we ran some tests with live migration, as this is key to our maintenance activities and would be a show-stopped if it didn’t work. We found that removing any pinning restrictions immediately prior to migration resulted in them being completely reset on the target host, which could then be optimised accordingly post-migration. Unfortunately, there would be a small window of time where we couldn’t prevent tenants from sharing a physical core on the target host after a migration, but we think this is an acceptable risk given the nature of these attacks.
Obviously, this approach may not be appropriate in many circumstances, such as if you have many tenants who just run single VMs with one vCPU, or if over-allocation is in use. We have also only looked at KVM and libvirt. I would love to know what people think of this approach however. Are there any other clear issues that you can think of which we may not have considered? If it seems like a reasonable idea, is it something that could fit into Nova and, if so, where in the architecture is the best place for it to sit? I know you can currently specify per-instance CPU pinning via flavor parameters, so a similar approach could be taken for this strategy. Alternatively, we can look at implementing it as an external plugin of some kind for use by those with a similar setup.
More information about the openstack-discuss