[openstack-dev] realtime kvm cpu affinities

Henning Schild henning.schild at siemens.com
Tue Jun 27 15:36:03 UTC 2017


Am Tue, 27 Jun 2017 09:28:34 -0600
schrieb Chris Friesen <chris.friesen at windriver.com>:

> On 06/27/2017 01:45 AM, Sahid Orentino Ferdjaoui wrote:
> > On Mon, Jun 26, 2017 at 12:12:49PM -0600, Chris Friesen wrote:  
> >> On 06/25/2017 02:09 AM, Sahid Orentino Ferdjaoui wrote:  
> >>> On Fri, Jun 23, 2017 at 10:34:26AM -0600, Chris Friesen wrote:  
> >>>> On 06/23/2017 09:35 AM, Henning Schild wrote:  
> >>>>> Am Fri, 23 Jun 2017 11:11:10 +0200
> >>>>> schrieb Sahid Orentino Ferdjaoui <sferdjao at redhat.com>:  
> >>>>  
> >>>>>> In Linux RT context, and as you mentioned, the non-RT vCPU can
> >>>>>> acquire some guest kernel lock, then be pre-empted by emulator
> >>>>>> thread while holding this lock. This situation blocks RT vCPUs
> >>>>>> from doing its work. So that is why we have implemented [2].
> >>>>>> For DPDK I don't think we have such problems because it's
> >>>>>> running in userland.
> >>>>>>
> >>>>>> So for DPDK context I think we could have a mask like we have
> >>>>>> for RT and basically considering vCPU0 to handle best effort
> >>>>>> works (emulator threads, SSH...). I think it's the current
> >>>>>> pattern used by DPDK users.  
> >>>>>
> >>>>> DPDK is just a library and one can imagine an application that
> >>>>> has cross-core communication/synchronisation needs where the
> >>>>> emulator slowing down vpu0 will also slow down vcpu1. You DPDK
> >>>>> application would have to know which of its cores did not get a
> >>>>> full pcpu.
> >>>>>
> >>>>> I am not sure what the DPDK-example is doing in this
> >>>>> discussion, would that not just be cpu_policy=dedicated? I
> >>>>> guess normal behaviour of dedicated is that emulators and io
> >>>>> happily share pCPUs with vCPUs and you are looking for a way to
> >>>>> restrict emulators/io to a subset of pCPUs because you can live
> >>>>> with some of them beeing not 100%.  
> >>>>
> >>>> Yes.  A typical DPDK-using VM might look something like this:
> >>>>
> >>>> vCPU0: non-realtime, housekeeping and I/O, handles all virtual
> >>>> interrupts and "normal" linux stuff, emulator runs on same pCPU
> >>>> vCPU1: realtime, runs in tight loop in userspace processing
> >>>> packets vCPU2: realtime, runs in tight loop in userspace
> >>>> processing packets vCPU3: realtime, runs in tight loop in
> >>>> userspace processing packets
> >>>>
> >>>> In this context, vCPUs 1-3 don't really ever enter the kernel,
> >>>> and we've offloaded as much kernel work as possible from them
> >>>> onto vCPU0.  This works pretty well with the current system.
> >>>>  
> >>>>>> For RT we have to isolate the emulator threads to an
> >>>>>> additional pCPU per guests or as your are suggesting to a set
> >>>>>> of pCPUs for all the guests running.
> >>>>>>
> >>>>>> I think we should introduce a new option:
> >>>>>>
> >>>>>>      - hw:cpu_emulator_threads_mask=^1
> >>>>>>
> >>>>>> If on 'nova.conf' - that mask will be applied to the set of
> >>>>>> all host CPUs (vcpu_pin_set) to basically pack the emulator
> >>>>>> threads of all VMs running here (useful for RT context).  
> >>>>>
> >>>>> That would allow modelling exactly what we need.
> >>>>> In nova.conf we are talking absolute known values, no need for
> >>>>> a mask and a set is much easier to read. Also using the same
> >>>>> name does not sound like a good idea.
> >>>>> And the name vcpu_pin_set clearly suggest what kind of load
> >>>>> runs here, if using a mask it should be called pin_set.  
> >>>>
> >>>> I agree with Henning.
> >>>>
> >>>> In nova.conf we should just use a set, something like
> >>>> "rt_emulator_vcpu_pin_set" which would be used for running the
> >>>> emulator/io threads of *only* realtime instances.  
> >>>
> >>> I'm not agree with you, we have a set of pCPUs and we want to
> >>> substract some of them for the emulator threads. We need a mask.
> >>> The only set we need is to selection which pCPUs Nova can use
> >>> (vcpus_pin_set).
> >>>  
> >>>> We may also want to have "rt_emulator_overcommit_ratio" to
> >>>> control how many threads/instances we allow per pCPU.  
> >>>
> >>> Not really sure to have understand this point? If it is to
> >>> indicate that for a pCPU isolated we want X guest emulator
> >>> threads, the same behavior is achieved by the mask. A host for
> >>> realtime is dedicated for realtime, no overcommitment and the
> >>> operators know the number of host CPUs, they can easily deduct a
> >>> ratio and so the corresponding mask.  
> >>
> >> Suppose I have a host with 64 CPUs.  I reserve three for host
> >> overhead and networking, leaving 61 for instances.  If I have
> >> instances with one non-RT vCPU and one RT vCPU then I can run 30
> >> instances.  If instead my instances have one non-RT and 5 RT vCPUs
> >> then I can run 12 instances.  If I put all of my emulator threads
> >> on the same pCPU, it might make a difference whether I put 30 sets
> >> of emulator threads or 12 sets.  
> >
> > Oh I understand your point now, but not sure that is going to make
> > any difference. I would say the load in the isolated cores is
> > probably going to be the same. Even that an overhead will be the
> > number of threads handled which will be slightly higher in your
> > first scenario. 
> >> The proposed "rt_emulator_overcommit_ratio" would simply say "nova
> >> is allowed to run X instances worth of emulator threads on each
> >> pCPU in "rt_emulator_vcpu_pin_set".  If we've hit that threshold,
> >> then no more RT instances are allowed to schedule on this compute
> >> node (but non-RT instances would still be allowed).  
> >
> > Also I don't think we want to schedule where the emulator threads of
> > the guests should be pinned on the isolated cores. We will let them
> > float on the set of cores isolated. If there is a requierement to
> > have them pinned so probably the current implementation will be
> > enough.  
> 
> Once you use "isolcpus" on the host, the host scheduler won't "float"
> threads between the CPUs based on load.  To get the float behaviour
> you'd have to not isolate the pCPUs that will be used for emulator
> threads, but then you run the risk of the host running other work on
> those pCPUs (unless you use cpusets or something to isolate the host
> work to a subset of non-isolcpus pCPUs).

With openstack you use libvirt and libvirt uses cgroups/cpusets to get
those threads onto these cores.

Henning

> Chris
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list