[openstack-dev] realtime kvm cpu affinities

Chris Friesen chris.friesen at windriver.com
Wed Jun 21 17:40:14 UTC 2017

On 06/21/2017 10:46 AM, Henning Schild wrote:
> Am Wed, 21 Jun 2017 10:04:52 -0600
> schrieb Chris Friesen <chris.friesen at windriver.com>:

> i guess you are talking about that section from [1]:
>>>> We could use a host level tunable to just reserve a set of host
>>>> pCPUs for running emulator threads globally, instead of trying to
>>>> account for it per instance. This would work in the simple case,
>>>> but when NUMA is used, it is highly desirable to have more fine
>>>> grained config to control emulator thread placement. When real-time
>>>> or dedicated CPUs are used, it will be critical to separate
>>>> emulator threads for different KVM instances.

Yes, that's the relevant section.

> I know it has been considered, but i would like to bring the topic up
> again. Because doing it that way allows for many more rt-VMs on a host
> and i am not sure i fully understood why the idea was discarded in the
> end.
> I do not really see the influence of NUMA here. Say the
> emulator_pin_set is used only for realtime VMs, we know that the
> emulators and IOs can be "slow" so crossing numa-nodes should not be an
> issue. Or you could say the set needs to contain at least one core per
> numa-node and schedule emulators next to their vcpus.
> As we know from our setup, and as Luiz confirmed - it is _not_ "critical
> to separate emulator threads for different KVM instances".
> They have to be separated from the vcpu-cores but not from each other.
> At least not on the "cpuset" basis, maybe "blkio" and cgroups like that.

I'm reluctant to say conclusively that we don't need to separate emulator 
threads since I don't think we've considered all the cases.  For example, what 
happens if one or more of the instances are being live-migrated?  The migration 
thread for those instances will be very busy scanning for dirty pages, which 
could delay the emulator threads for other instances and also cause significant 
cross-NUMA traffic unless we ensure at least one core per NUMA-node.

Also, I don't think we've determined how much CPU time is needed for the 
emulator threads.  If we have ~60 CPUs available for instances split across two 
NUMA nodes, can we safely run the emulator threads of 30 instances all together 
on a single CPU?  If not, how much "emulator overcommit" is allowable?


More information about the OpenStack-dev mailing list