[openstack-dev] [nova] Core pinning

Tuomas Paappanen tuomas.paappanen at tieto.com
Wed Nov 27 13:50:47 UTC 2013

On 19.11.2013 20:18, yunhong jiang wrote:
> On Tue, 2013-11-19 at 12:52 +0000, Daniel P. Berrange wrote:
>> On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote:
>>> Hi all,
>>> I would like to hear your thoughts about core pinning in Openstack.
>>> Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs
>>> what can be used by instances. I didn't find blueprint, but I think
>>> this feature is for isolate cpus used by host from cpus used by
>>> instances(VCPUs).
>>> But, from performance point of view it is better to exclusively
>>> dedicate PCPUs for VCPUs and emulator. In some cases you may want to
>>> guarantee that only one instance(and its VCPUs) is using certain
>>> PCPUs.  By using core pinning you can optimize instance performance
>>> based on e.g. cache sharing, NUMA topology, interrupt handling, pci
>>> pass through(SR-IOV) in multi socket hosts etc.
>>> We have already implemented feature like this(PoC with limitations)
>>> to Nova Grizzly version and would like to hear your opinion about
>>> it.
>>> The current implementation consists of three main parts:
>>> - Definition of pcpu-vcpu maps for instances and instance spawning
>>> - (optional) Compute resource and capability advertising including
>>> free pcpus and NUMA topology.
>>> - (optional) Scheduling based on free cpus and NUMA topology.
>>> The implementation is quite simple:
>>> (additional/optional parts)
>>> Nova-computes are advertising free pcpus and NUMA topology in same
>>> manner than host capabilities. Instances are scheduled based on this
>>> information.
>>> (core pinning)
>>> admin can set PCPUs for VCPUs and for emulator process, or select
>>> NUMA cell for instance vcpus, by adding key:value pairs to flavor's
>>> extra specs.
>>> instance has 4 vcpus
>>> <key>:<value>
>>> vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
>>> emulator:5 --> emulator pinned to pcpu5
>>> or
>>> numacell:0 --> all vcpus are pinned to pcpus in numa cell 0.
>>> In nova-compute, core pinning information is read from extra specs
>>> and added to domain xml same way as cpu quota values(cputune).
>>> <cputune>
>>>        <vcpupin vcpu='0' cpuset='1'/>
>>>        <vcpupin vcpu='1' cpuset='2'/>
>>>        <vcpupin vcpu='2' cpuset='3'/>
>>>        <vcpupin vcpu='3' cpuset='4'/>
>>>        <emulatorpin cpuset='5'/>
>>> </cputune>
>>> What do you think? Implementation alternatives? Is this worth of
>>> blueprint? All related comments are welcome!
>> I think there are several use cases mixed up in your descriptions
>> here which should likely be considered independantly
>>   - pCPU/vCPU pinning
>>     I don't really think this is a good idea as a general purpose
>>     feature in its own right. It tends to lead to fairly inefficient
>>     use of CPU resources when you consider that a large % of guests
>>     will be mostly idle most of the time. It has a fairly high
>>     administrative burden to maintain explicit pinning too. This
>>     feels like a data center virt use case rather than cloud use
>>     case really.
>>   - Dedicated CPU reservation
>>     The ability of an end user to request that their VM (or their
>>     group of VMs) gets assigned a dedicated host CPU set to run on.
>>     This is obviously something that would have to be controlled
>>     at a flavour level, and in a commercial deployment would carry
>>     a hefty pricing premium.
>>     I don't think you want to expose explicit pCPU/vCPU placement
>>     for this though. Just request the high level concept and allow
>>     the virt host to decide actual placement
I think pcpu/vcpu pinning could be considered like an extension for 
dedicated cpu reservation feature. And I agree that if we exclusively 
dedicate pcpus for VMs it is inefficient from cloud point of view, but 
in some case, end user may want to be sure(and ready to pay) that their 
VMs have resources available e.g. for sudden load peaks.

So, here is my proposal how dedicated cpu reservation would function on 
high level:

When an end user wants VM with nn vcpus which are running on dedicated 
host cpu set, admin could enable it by setting a new "dedicate_pcpu" 
parameter in a flavor(e.g. optional flavor parameter). By default, 
amount of pcpus and vcpus could be same. And as option, explicit 
vcpu/pcpu pinning could be done by defining vcpu/pcpu relations to 
flavors extra specs(vcpupin:0 0...).

In the virt driver there is two alternatives how to do the pcpu sharing 
1. all dedicated pcpus are shared with all vcpus(default case) or 2. 
each vcpu has dedicated pcpu(vcpu 0 will be pinned to the first pcpu in 
a cpu set, vcpu 1 to the second pcpu and so on). Vcpu/pcpu pinning 
option could be used to extend the latter case.

In any case, before VM with or without dedicated pcpus is launched the 
virt driver must take care of that the dedicated pcpus are excluded from 
existing VMs and from a new VMs and that there are enough free pcpus for 
placement. And I think minimum amount of pcpus for VMs without dedicated 
pcpus must be configurable somewhere.


Br, Tuomas

>>   - Host NUMA placement.
>>     By not taking NUMA into account currently the libvirt driver
>>     at least is badly wasting resources. Having too much cross-numa
>>     node memory access by guests just kills scalability. The virt
>>     driver should really automaticall figure out cpu & memory pinning
>>     within the scope of a NUMA node automatically. No admin config
>>     should be required for this.
>>   - Guest NUMA topology
>>     If the flavour memory size / cpu count exceeds the size of a
>>     single NUMA node, then the flavour should likely have a way to
>>     express that the guest should see multiple NUMA nodes. The
>>     virt host would then set guest NUMA topology to match the way
>>     it places vCPUs & memory on host NUMA nodes. Again you don't
>>     want explicit pcpu/vcpu mapping done by the admin for this.
>> Regards,
>> Daniel
> Quite clear splitting and +1 for P/V pin option.
> --jyh
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list