[openstack-dev] realtime kvm cpu affinities

Henning Schild henning.schild at siemens.com
Thu Jun 22 07:47:10 UTC 2017

Am Wed, 21 Jun 2017 11:40:14 -0600
schrieb Chris Friesen <chris.friesen at windriver.com>:

> On 06/21/2017 10:46 AM, Henning Schild wrote:
> > Am Wed, 21 Jun 2017 10:04:52 -0600
> > schrieb Chris Friesen <chris.friesen at windriver.com>:  
> > i guess you are talking about that section from [1]:
> >  
> >>>> We could use a host level tunable to just reserve a set of host
> >>>> pCPUs for running emulator threads globally, instead of trying to
> >>>> account for it per instance. This would work in the simple case,
> >>>> but when NUMA is used, it is highly desirable to have more fine
> >>>> grained config to control emulator thread placement. When
> >>>> real-time or dedicated CPUs are used, it will be critical to
> >>>> separate emulator threads for different KVM instances.  
> Yes, that's the relevant section.
> > I know it has been considered, but i would like to bring the topic
> > up again. Because doing it that way allows for many more rt-VMs on
> > a host and i am not sure i fully understood why the idea was
> > discarded in the end.
> >
> > I do not really see the influence of NUMA here. Say the
> > emulator_pin_set is used only for realtime VMs, we know that the
> > emulators and IOs can be "slow" so crossing numa-nodes should not
> > be an issue. Or you could say the set needs to contain at least one
> > core per numa-node and schedule emulators next to their vcpus.
> >
> > As we know from our setup, and as Luiz confirmed - it is _not_
> > "critical to separate emulator threads for different KVM instances".
> > They have to be separated from the vcpu-cores but not from each
> > other. At least not on the "cpuset" basis, maybe "blkio" and
> > cgroups like that.  
> I'm reluctant to say conclusively that we don't need to separate
> emulator threads since I don't think we've considered all the cases.
> For example, what happens if one or more of the instances are being
> live-migrated?  The migration thread for those instances will be very
> busy scanning for dirty pages, which could delay the emulator threads
> for other instances and also cause significant cross-NUMA traffic
> unless we ensure at least one core per NUMA-node.

Realtime instances can not be live-migrated. We are talking about
threads that can not even be moved between two cores on one numa-node
without missing a deadline. But your point is good because it could
mean that such an emulator_set - if defined - should not be used for all
> Also, I don't think we've determined how much CPU time is needed for
> the emulator threads.  If we have ~60 CPUs available for instances
> split across two NUMA nodes, can we safely run the emulator threads
> of 30 instances all together on a single CPU?  If not, how much
> "emulator overcommit" is allowable?

That depends on how much IO your VMs are issuing and can not be
answered in general. All VMs can cause high load with IO/emulation,
rt-VMs are probably less likely to do so.
Say your 64cpu compute-node would be used for both rt and regular. To
mix you would have two instances of nova running on that machine. One
gets node0 (32 cpus) for regular VMs. The emulator-pin-set would not
be defined here (so it would equal the vcpu_pin_set, full overlap).
The other nova would get node1 and disable hyperthreads for all rt
cores (17 cpus left). It would need at least one core for housekeeping
and io/emulation threads. So you are down to max. 15 VMs putting their
IO on that one core and its hyperthread 7.5 per cpu.

In the same setup with [2] we would get a max of 7 single-cpu VMs,
instead of 15! And 15 vs 31 if you dedicate the whole box to rt.

> Chris

More information about the OpenStack-dev mailing list