Open Stack

Thu Jan 10 17:56:54 UTC 2019

On Thu, 2019-01-10 at 11:05 -0500, Jay Pipes wrote:
> On 01/10/2019 10:49 AM, Robert Donovan wrote:
> > Hello Nova folks,
> > 
> > I spoke to some of you very briefly about this in Berlin (thanks
> > again for your time), and we were resigned to turning off SMT to
> > fully protect against future CPU cache side-channel attacks as I
> > know many others have done. However, we have stubbornly done a bit
> > of last-resort research and testing into using vCPU pinning on a
> > per-tenant basis as an alternative and I’d like to lay it out in
> > more detail for you to make sure there are no legs in the idea
> > before abandoning it completely.
> > 
> > The idea is to use libvirt’s vcpupin ability to ensure that two
> > different tenants never share the same physical CPU core, so they
> > cannot theoretically steal each other’s data via an L1 or L2 cache
> > side-channel. The pinning would be optimised to make use of as many
> > logical cores as possible for any given tenant. We would also
> > isolate other key system processes to a separate range of physical
> > cores. After discussions in Berlin, we ran some tests with live
> > migration, as this is key to our maintenance activities and would
> > be a show-stopped if it didn’t work. We found that removing any
> > pinning restrictions immediately prior to migration resulted in
> > them being completely reset on the target host, which could then be
> > optimised accordingly post-migration. Unfortunately, there would be
> > a small window of time where we couldn’t prevent tenants from
> > sharing a physical core on the target host after a migration, but
> > we think this is an acceptable risk given the nature of these
> > attacks.
> > 
> > Obviously, this approach may not be appropriate in many
> > circumstances, such as if you have many tenants who just run single
> > VMs with one vCPU, or if over-allocation is in use. We have also
> > only looked at KVM and libvirt. I would love to know what people
> > think of this approach however. Are there any other clear issues
> > that you can think of which we may not have considered? If it seems
> > like a reasonable idea, is it something that could fit into Nova
> > and, if so, where in the architecture is the best place for it to
> > sit? I know you can currently specify per-instance CPU pinning via
> > flavor parameters, so a similar approach could be taken for this
> > strategy. Alternatively, we can look at implementing it as an
> > external plugin of some kind for use by those with a similar setup.
> 
> IMHO, if you're going to go through all the hassle of pinning guest vCPU 
> threads to distinct logical host processors, you might as well just use 
> dedicated CPU resources for everything. As you mention above, you can't 
> have overcommit anyway if you're concerned about this problem. Once you 
> have a 1.0 cpu_allocation_ratio, you're essentially limiting your CPU 
> resources to a dedicated host CPU -> guest CPU situation so you might as 
> well just use CPU pinning and deal with all the headaches that brings 
> with it.

Indeed. My initial answer to this was "use CPU thread policies"
(specifically, the 'require' policy) to ensure each instance owns its
entire core, thinking you were using dedicated/pinned CPUs. For shared
CPUs, I'm not sure how we could ever do something like you've proposed
in a manner that would result in less than the ~20% or so performance
degradation I usually see quoted when turning off SMT. Far too much
second guessing of the expected performance requirements of the guest
would be necessary.

Stephen

Open Stack

[nova][dev] vCPU Pinning for L1/L2 cache side-channel vulnerability mitigation

OpenStack

Community

Documentation

Branding & Legal