[openstack-dev] realtime kvm cpu affinities
sfinucan at redhat.com
sfinucan at redhat.com
Thu Jun 29 16:59:41 UTC 2017
On Tue, 2017-06-20 at 09:48 +0200, Henning Schild wrote:
> Hi,
>
> We are using OpenStack for managing realtime guests. We modified
> it and contributed to discussions on how to model the realtime
> feature. More recent versions of OpenStack have support for realtime,
> and there are a few proposals on how to improve that further.
>
> ...
I'd put off working my way through this thread until I'd time to sit down and
read it in full. Here's what I'm seeing by way of summaries _so far_.
# Current situation
I think this tree (sans 'hw' prefixes for brevity) represents the current
situation around flavor extra specs and image meta. Pretty much everything
hangs off cpu_policy=dedicated. Correct me if I'm wrong.
cpu_policy
╞═> shared
╘═> dedicated
├─> cpu_thread_policy
│ ╞═> prefer
│ ╞═> isolate
│ ╘═> require
├─> emulator_threads_policy (*)
│ ╞═> share
│ ╘═> isolate
└─> cpu_realtime
╞═> no
╘═> yes
└─> cpu_realtime_mask
╘═> (a mask of guest cores)
(*) this one isn't configurable via images. I never really got why but meh.
There's also some host-level configuration options
vcpu_pin_set
╘═> (a list of host cores that nova can use)
Finally, there's some configuration you can do with your choice of kernel and
kernel options (e.g. 'isolcpus').
For real time workloads, the expectation would be that you would set:
cpu_policy
╘═> dedicated
├─> cpu_thread_policy
│ ╘═> isolate
├─> emulator_threads_policy
│ ╘═> isolate
└─> cpu_realtime
╘═> yes
└─> cpu_realtime_mask
╘═> (a mask of guest cores)
That would result in a host that would use N+1 vCPUs, where N corresponds to
the number of instance cores. Of the N cores, the set masked by
'cpu_realtime_mask' will be non-realtime. The remainder will be realtime.
# The Problem(s)
I'm going to thread this to capture the arguments and counter arguments:
## Problem 1
henning.schild suggested that the current implementation of
'emulator_thread_policy' is too resource intensive, as the 1 core generally
has a minimal workload for entire guests. This can significantly limit the
number of guests that can be booted per host, particularly for guests with
smaller numbers of cores. Instead, he has implemented a 'emulator_pin_set'
host-level option, which complements 'vcpu_pin_set'. This allows us to "pool"
emulator threads, similar to how vCPU threads behave with 'cpu_policy=shared'.
He suggests this be adopted by nova.
sahid seconded this, but suggests 'emulator_pin_set' be renamed
'cpu_emulator_threads_mask' and work as a mask of 'vcpu_pin_set'. He also
suggested making a similarly-named flavor property, that would allow the
user to use one of their cores for non-realtime
henning.schild suggested a set would still be better, but that
'vpu_pin_set' be renamed to 'pin_set', as it would no longer be for only
vCPUs
cfriesen seconded henning.schild's position but was not entirely
convinced that sharing emulator threads on a single pCPU is guaranteed
to be safe, for example if one instance starts seriously hammering on
I/O or does live migration or something. He suggested that an additional
option, 'rt_emulator_overcommit_ratio' be added to make overcommitting
explicit. In addition, he suggested making the flavor property a bitmask
sahid questioned the need for an overcommit ratio, given that there is
no overcommit of the hosts. An operator could synthesize a suitable
value for 'emulator_pin_set'/'cpu_emulator_threads_mask'. He also
disagreed with the suggestion that the flavor property be a bitmask as
the only set is that of the vCPUs.
cfriesen clarifies to point out how a few instances with many vCPUs
will have more overhead requirements than many instances with few
vCPUs. We need to be able to fail scheduling if the emulator thread
cores are oversubscribed.
## Problem 2
henning.schild suggests that hosts should be able to handle both RT and non-RT
instances. This could be achieved through multiple instances of nova
sahid points out that the recommendation is to use host aggregates to
separate the two.
henning.schild states that hosts with RT kernels can manage non-RT guests
just fine. However, if using host aggregates is the recommendation then it
should be possible to run multiple nova instances on a host, because
dedicating an entire machine is not viable for smaller operations. cfriesen
seconds this perspective, though not this solution.
# Solutions
Thus far, we've no clear conclusions on directions to go, so I've took a stab
below. Henning, Sahid, Chris: does the above/below make sense, and is there
anything we need to further clarify?
# Problem 1
>From the above, there are 3-4 work items:
- Add a 'emulator_pin_set' or 'cpu_emulator_threads_mask' configuration option
- If using a mask, rename 'vcpu_pin_set' to 'pin_set' (or, better,
'usable_cpus')
- Add a 'emulator_overcommit_ratio', which will do for emulator threads what
the other ratios do for vCPUs and memory
- Deprecate 'hw:emulator_thread_policy'???
# Problem 2
No clear conclusions yet?
---
Cheers,
Stephen
More information about the OpenStack-dev
mailing list