[openstack-dev] realtime kvm cpu affinities

Henning Schild henning.schild at siemens.com
Mon Jun 26 08:19:12 UTC 2017


Am Sun, 25 Jun 2017 10:09:10 +0200
schrieb Sahid Orentino Ferdjaoui <sferdjao at redhat.com>:

> On Fri, Jun 23, 2017 at 10:34:26AM -0600, Chris Friesen wrote:
> > On 06/23/2017 09:35 AM, Henning Schild wrote:  
> > > Am Fri, 23 Jun 2017 11:11:10 +0200
> > > schrieb Sahid Orentino Ferdjaoui <sferdjao at redhat.com>:  
> >   
> > > > In Linux RT context, and as you mentioned, the non-RT vCPU can
> > > > acquire some guest kernel lock, then be pre-empted by emulator
> > > > thread while holding this lock. This situation blocks RT vCPUs
> > > > from doing its work. So that is why we have implemented [2].
> > > > For DPDK I don't think we have such problems because it's
> > > > running in userland.
> > > > 
> > > > So for DPDK context I think we could have a mask like we have
> > > > for RT and basically considering vCPU0 to handle best effort
> > > > works (emulator threads, SSH...). I think it's the current
> > > > pattern used by DPDK users.  
> > > 
> > > DPDK is just a library and one can imagine an application that has
> > > cross-core communication/synchronisation needs where the emulator
> > > slowing down vpu0 will also slow down vcpu1. You DPDK application
> > > would have to know which of its cores did not get a full pcpu.
> > > 
> > > I am not sure what the DPDK-example is doing in this discussion,
> > > would that not just be cpu_policy=dedicated? I guess normal
> > > behaviour of dedicated is that emulators and io happily share
> > > pCPUs with vCPUs and you are looking for a way to restrict
> > > emulators/io to a subset of pCPUs because you can live with some
> > > of them beeing not 100%.  
> > 
> > Yes.  A typical DPDK-using VM might look something like this:
> > 
> > vCPU0: non-realtime, housekeeping and I/O, handles all virtual
> > interrupts and "normal" linux stuff, emulator runs on same pCPU
> > vCPU1: realtime, runs in tight loop in userspace processing packets
> > vCPU2: realtime, runs in tight loop in userspace processing packets
> > vCPU3: realtime, runs in tight loop in userspace processing packets
> > 
> > In this context, vCPUs 1-3 don't really ever enter the kernel, and
> > we've offloaded as much kernel work as possible from them onto
> > vCPU0.  This works pretty well with the current system.
> >   
> > > > For RT we have to isolate the emulator threads to an additional
> > > > pCPU per guests or as your are suggesting to a set of pCPUs for
> > > > all the guests running.
> > > > 
> > > > I think we should introduce a new option:
> > > > 
> > > >    - hw:cpu_emulator_threads_mask=^1
> > > > 
> > > > If on 'nova.conf' - that mask will be applied to the set of all
> > > > host CPUs (vcpu_pin_set) to basically pack the emulator threads
> > > > of all VMs running here (useful for RT context).  
> > > 
> > > That would allow modelling exactly what we need.
> > > In nova.conf we are talking absolute known values, no need for a
> > > mask and a set is much easier to read. Also using the same name
> > > does not sound like a good idea.
> > > And the name vcpu_pin_set clearly suggest what kind of load runs
> > > here, if using a mask it should be called pin_set.  
> > 
> > I agree with Henning.
> > 
> > In nova.conf we should just use a set, something like
> > "rt_emulator_vcpu_pin_set" which would be used for running the
> > emulator/io threads of *only* realtime instances.  
> 
> I'm not agree with you, we have a set of pCPUs and we want to
> substract some of them for the emulator threads. We need a mask. The
> only set we need is to selection which pCPUs Nova can use
> (vcpus_pin_set).

At that point it does not really matter whether it is a set or a mask.
They can both express the same where a set is easier to read/configure.
With the same argument you could say that vcpu_pin_set should be a mask
over the hosts pcpus.

As i said before: vcpu_pin_set should be renamed because all sorts of
threads are put here (pcpu_pin_set?). But that would be a bigger change
and should be discussed as a seperate issue.

So far we talked about a compute-node for realtime only doing realtime.
In that case vcpu_pin_set + emulator_io_mask would work. If you want to
run regular VMs on the same host, you can run a second nova, like we do.

We could also use vcpu_pin_set + rt_vcpu_pin_set(/mask). I think that
would allow modelling all cases in just one nova. Having all in one
nova, you could potentially repurpose rt cpus to best-effort and back.
Some day in the future ...

> > We may also want to have "rt_emulator_overcommit_ratio" to control
> > how many threads/instances we allow per pCPU.  
> 
> Not really sure to have understand this point? If it is to indicate
> that for a pCPU isolated we want X guest emulator threads, the same
> behavior is achieved by the mask. A host for realtime is dedicated for
> realtime, no overcommitment and the operators know the number of host
> CPUs, they can easily deduct a ratio and so the corresponding mask.

Agreed.

> > > > If on flavor extra-specs It will be applied to the vCPUs
> > > > dedicated for the guest (useful for DPDK context).  
> > > 
> > > And if both are present the flavor wins and nova.conf is
> > > ignored?  
> > 
> > In the flavor I'd like to see it be a full bitmask, not an
> > exclusion mask with an implicit full set.  Thus the end-user could
> > specify "hw:cpu_emulator_threads_mask=0" and get the emulator
> > threads to run alongside vCPU0.  
> 
> Same here, I'm not agree, the only set is the vCPUs of the guest. Then
> we want a mask to substract some of them.

The current mask is fine, but using the same name in nova.conf and in
the flavor does not seem like a good idea.

Henning

> > Henning, there is no conflict, the nova.conf setting and the flavor
> > setting are used for two different things.
> > 
> > Chris
> > 
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev  
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list