[openstack-dev] [nova] Core pinning

Daniel P. Berrange berrange at redhat.com
Tue Nov 19 12:52:14 UTC 2013

On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote:
> Hi all,
> I would like to hear your thoughts about core pinning in Openstack.
> Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs
> what can be used by instances. I didn't find blueprint, but I think
> this feature is for isolate cpus used by host from cpus used by
> instances(VCPUs).
> But, from performance point of view it is better to exclusively
> dedicate PCPUs for VCPUs and emulator. In some cases you may want to
> guarantee that only one instance(and its VCPUs) is using certain
> PCPUs.  By using core pinning you can optimize instance performance
> based on e.g. cache sharing, NUMA topology, interrupt handling, pci
> pass through(SR-IOV) in multi socket hosts etc.
> We have already implemented feature like this(PoC with limitations)
> to Nova Grizzly version and would like to hear your opinion about
> it.
> The current implementation consists of three main parts:
> - Definition of pcpu-vcpu maps for instances and instance spawning
> - (optional) Compute resource and capability advertising including
> free pcpus and NUMA topology.
> - (optional) Scheduling based on free cpus and NUMA topology.
> The implementation is quite simple:
> (additional/optional parts)
> Nova-computes are advertising free pcpus and NUMA topology in same
> manner than host capabilities. Instances are scheduled based on this
> information.
> (core pinning)
> admin can set PCPUs for VCPUs and for emulator process, or select
> NUMA cell for instance vcpus, by adding key:value pairs to flavor's
> extra specs.
> instance has 4 vcpus
> <key>:<value>
> vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
> emulator:5 --> emulator pinned to pcpu5
> or
> numacell:0 --> all vcpus are pinned to pcpus in numa cell 0.
> In nova-compute, core pinning information is read from extra specs
> and added to domain xml same way as cpu quota values(cputune).
> <cputune>
>       <vcpupin vcpu='0' cpuset='1'/>
>       <vcpupin vcpu='1' cpuset='2'/>
>       <vcpupin vcpu='2' cpuset='3'/>
>       <vcpupin vcpu='3' cpuset='4'/>
>       <emulatorpin cpuset='5'/>
> </cputune>
> What do you think? Implementation alternatives? Is this worth of
> blueprint? All related comments are welcome!

I think there are several use cases mixed up in your descriptions
here which should likely be considered independantly

 - pCPU/vCPU pinning

   I don't really think this is a good idea as a general purpose
   feature in its own right. It tends to lead to fairly inefficient
   use of CPU resources when you consider that a large % of guests
   will be mostly idle most of the time. It has a fairly high
   administrative burden to maintain explicit pinning too. This
   feels like a data center virt use case rather than cloud use
   case really.

 - Dedicated CPU reservation

   The ability of an end user to request that their VM (or their
   group of VMs) gets assigned a dedicated host CPU set to run on.
   This is obviously something that would have to be controlled
   at a flavour level, and in a commercial deployment would carry
   a hefty pricing premium.

   I don't think you want to expose explicit pCPU/vCPU placement
   for this though. Just request the high level concept and allow
   the virt host to decide actual placement

 - Host NUMA placement.

   By not taking NUMA into account currently the libvirt driver
   at least is badly wasting resources. Having too much cross-numa
   node memory access by guests just kills scalability. The virt
   driver should really automaticall figure out cpu & memory pinning
   within the scope of a NUMA node automatically. No admin config
   should be required for this.

 - Guest NUMA topology

   If the flavour memory size / cpu count exceeds the size of a
   single NUMA node, then the flavour should likely have a way to
   express that the guest should see multiple NUMA nodes. The
   virt host would then set guest NUMA topology to match the way
   it places vCPUs & memory on host NUMA nodes. Again you don't
   want explicit pcpu/vcpu mapping done by the admin for this.

|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

More information about the OpenStack-dev mailing list