Open Stack

Tue Nov 19 18:18:14 UTC 2013

On Tue, 2013-11-19 at 12:52 +0000, Daniel P. Berrange wrote:
> On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote:
> > Hi all,
> > 
> > I would like to hear your thoughts about core pinning in Openstack.
> > Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs
> > what can be used by instances. I didn't find blueprint, but I think
> > this feature is for isolate cpus used by host from cpus used by
> > instances(VCPUs).
> > 
> > But, from performance point of view it is better to exclusively
> > dedicate PCPUs for VCPUs and emulator. In some cases you may want to
> > guarantee that only one instance(and its VCPUs) is using certain
> > PCPUs.  By using core pinning you can optimize instance performance
> > based on e.g. cache sharing, NUMA topology, interrupt handling, pci
> > pass through(SR-IOV) in multi socket hosts etc.
> > 
> > We have already implemented feature like this(PoC with limitations)
> > to Nova Grizzly version and would like to hear your opinion about
> > it.
> > 
> > The current implementation consists of three main parts:
> > - Definition of pcpu-vcpu maps for instances and instance spawning
> > - (optional) Compute resource and capability advertising including
> > free pcpus and NUMA topology.
> > - (optional) Scheduling based on free cpus and NUMA topology.
> > 
> > The implementation is quite simple:
> > 
> > (additional/optional parts)
> > Nova-computes are advertising free pcpus and NUMA topology in same
> > manner than host capabilities. Instances are scheduled based on this
> > information.
> > 
> > (core pinning)
> > admin can set PCPUs for VCPUs and for emulator process, or select
> > NUMA cell for instance vcpus, by adding key:value pairs to flavor's
> > extra specs.
> > 
> > EXAMPLE:
> > instance has 4 vcpus
> > <key>:<value>
> > vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
> > emulator:5 --> emulator pinned to pcpu5
> > or
> > numacell:0 --> all vcpus are pinned to pcpus in numa cell 0.
> > 
> > In nova-compute, core pinning information is read from extra specs
> > and added to domain xml same way as cpu quota values(cputune).
> > 
> > <cputune>
> >       <vcpupin vcpu='0' cpuset='1'/>
> >       <vcpupin vcpu='1' cpuset='2'/>
> >       <vcpupin vcpu='2' cpuset='3'/>
> >       <vcpupin vcpu='3' cpuset='4'/>
> >       <emulatorpin cpuset='5'/>
> > </cputune>
> > 
> > What do you think? Implementation alternatives? Is this worth of
> > blueprint? All related comments are welcome!
> 
> I think there are several use cases mixed up in your descriptions
> here which should likely be considered independantly
> 
>  - pCPU/vCPU pinning
> 
>    I don't really think this is a good idea as a general purpose
>    feature in its own right. It tends to lead to fairly inefficient
>    use of CPU resources when you consider that a large % of guests
>    will be mostly idle most of the time. It has a fairly high
>    administrative burden to maintain explicit pinning too. This
>    feels like a data center virt use case rather than cloud use
>    case really.
> 
>  - Dedicated CPU reservation
> 
>    The ability of an end user to request that their VM (or their
>    group of VMs) gets assigned a dedicated host CPU set to run on.
>    This is obviously something that would have to be controlled
>    at a flavour level, and in a commercial deployment would carry
>    a hefty pricing premium.
> 
>    I don't think you want to expose explicit pCPU/vCPU placement
>    for this though. Just request the high level concept and allow
>    the virt host to decide actual placement
> 
>  - Host NUMA placement.
> 
>    By not taking NUMA into account currently the libvirt driver
>    at least is badly wasting resources. Having too much cross-numa
>    node memory access by guests just kills scalability. The virt
>    driver should really automaticall figure out cpu & memory pinning
>    within the scope of a NUMA node automatically. No admin config
>    should be required for this.
> 
>  - Guest NUMA topology
> 
>    If the flavour memory size / cpu count exceeds the size of a
>    single NUMA node, then the flavour should likely have a way to
>    express that the guest should see multiple NUMA nodes. The
>    virt host would then set guest NUMA topology to match the way
>    it places vCPUs & memory on host NUMA nodes. Again you don't
>    want explicit pcpu/vcpu mapping done by the admin for this.
> 
> 
> 
> Regards,
> Daniel

Quite clear splitting and +1 for P/V pin option.

--jyh

Open Stack

[openstack-dev] [nova] Core pinning

OpenStack

Community

Documentation

Branding & Legal