Open Stack

Fri Jun 23 16:34:26 UTC 2017

On 06/23/2017 09:35 AM, Henning Schild wrote:
> Am Fri, 23 Jun 2017 11:11:10 +0200
> schrieb Sahid Orentino Ferdjaoui <sferdjao at redhat.com>:

>> In Linux RT context, and as you mentioned, the non-RT vCPU can acquire
>> some guest kernel lock, then be pre-empted by emulator thread while
>> holding this lock. This situation blocks RT vCPUs from doing its
>> work. So that is why we have implemented [2]. For DPDK I don't think
>> we have such problems because it's running in userland.
>>
>> So for DPDK context I think we could have a mask like we have for RT
>> and basically considering vCPU0 to handle best effort works (emulator
>> threads, SSH...). I think it's the current pattern used by DPDK users.
>
> DPDK is just a library and one can imagine an application that has
> cross-core communication/synchronisation needs where the emulator
> slowing down vpu0 will also slow down vcpu1. You DPDK application would
> have to know which of its cores did not get a full pcpu.
>
> I am not sure what the DPDK-example is doing in this discussion, would
> that not just be cpu_policy=dedicated? I guess normal behaviour of
> dedicated is that emulators and io happily share pCPUs with vCPUs and
> you are looking for a way to restrict emulators/io to a subset of pCPUs
> because you can live with some of them beeing not 100%.

Yes.  A typical DPDK-using VM might look something like this:

vCPU0: non-realtime, housekeeping and I/O, handles all virtual interrupts and 
"normal" linux stuff, emulator runs on same pCPU
vCPU1: realtime, runs in tight loop in userspace processing packets
vCPU2: realtime, runs in tight loop in userspace processing packets
vCPU3: realtime, runs in tight loop in userspace processing packets

In this context, vCPUs 1-3 don't really ever enter the kernel, and we've 
offloaded as much kernel work as possible from them onto vCPU0.  This works 
pretty well with the current system.

>> For RT we have to isolate the emulator threads to an additional pCPU
>> per guests or as your are suggesting to a set of pCPUs for all the
>> guests running.
>>
>> I think we should introduce a new option:
>>
>>    - hw:cpu_emulator_threads_mask=^1
>>
>> If on 'nova.conf' - that mask will be applied to the set of all host
>> CPUs (vcpu_pin_set) to basically pack the emulator threads of all VMs
>> running here (useful for RT context).
>
> That would allow modelling exactly what we need.
> In nova.conf we are talking absolute known values, no need for a mask
> and a set is much easier to read. Also using the same name does not
> sound like a good idea.
> And the name vcpu_pin_set clearly suggest what kind of load runs here,
> if using a mask it should be called pin_set.

I agree with Henning.

In nova.conf we should just use a set, something like "rt_emulator_vcpu_pin_set" 
which would be used for running the emulator/io threads of *only* realtime 
instances.

We may also want to have "rt_emulator_overcommit_ratio" to control how many 
threads/instances we allow per pCPU.

>> If on flavor extra-specs It will be applied to the vCPUs dedicated for
>> the guest (useful for DPDK context).
>
> And if both are present the flavor wins and nova.conf is ignored?

In the flavor I'd like to see it be a full bitmask, not an exclusion mask with 
an implicit full set.  Thus the end-user could specify 
"hw:cpu_emulator_threads_mask=0" and get the emulator threads to run alongside 
vCPU0.

Henning, there is no conflict, the nova.conf setting and the flavor setting are 
used for two different things.

Chris

Open Stack

[openstack-dev] realtime kvm cpu affinities

OpenStack

Community

Documentation

Branding & Legal