[openstack-dev] realtime kvm cpu affinities
Henning Schild
henning.schild at siemens.com
Thu Jul 6 15:16:03 UTC 2017
Stephen,
thanks for summing it all up! I am guessing that a blueprint or updates
to an existing blueprint will be next. We currently have a patch that
introduces a second pin_set to nova.conf and solves problem1 and 2 in
ocata. But that might be overlooking a couple of cases we do not care
about/did not come across yet.
Next to the text, that could serve as a discussion basis for what will
be imlpemented eventually.
I am happy because the two problems where acknowledged, the placement
strategy of the threads was discussed/reviewed with some input from kvm,
and we already talked about possible solutions.
So things are moving ;)
regards,
Henning
Am Thu, 29 Jun 2017 17:59:41 +0100
schrieb <sfinucan at redhat.com>:
> On Tue, 2017-06-20 at 09:48 +0200, Henning Schild wrote:
> > Hi,
> >
> > We are using OpenStack for managing realtime guests. We modified
> > it and contributed to discussions on how to model the realtime
> > feature. More recent versions of OpenStack have support for
> > realtime, and there are a few proposals on how to improve that
> > further.
> >
> > ...
>
> I'd put off working my way through this thread until I'd time to sit
> down and read it in full. Here's what I'm seeing by way of summaries
> _so far_.
>
> # Current situation
>
> I think this tree (sans 'hw' prefixes for brevity) represents the
> current situation around flavor extra specs and image meta. Pretty
> much everything hangs off cpu_policy=dedicated. Correct me if I'm
> wrong.
>
> cpu_policy
> ╞═> shared
> ╘═> dedicated
> ├─> cpu_thread_policy
> │ ╞═> prefer
> │ ╞═> isolate
> │ ╘═> require
> ├─> emulator_threads_policy (*)
> │ ╞═> share
> │ ╘═> isolate
> └─> cpu_realtime
> ╞═> no
> ╘═> yes
> └─> cpu_realtime_mask
> ╘═> (a mask of guest cores)
>
> (*) this one isn't configurable via images. I never really got why
> but meh.
>
> There's also some host-level configuration options
>
> vcpu_pin_set
> ╘═> (a list of host cores that nova can use)
>
> Finally, there's some configuration you can do with your choice of
> kernel and kernel options (e.g. 'isolcpus').
>
> For real time workloads, the expectation would be that you would set:
>
> cpu_policy
> ╘═> dedicated
> ├─> cpu_thread_policy
> │ ╘═> isolate
> ├─> emulator_threads_policy
> │ ╘═> isolate
> └─> cpu_realtime
> ╘═> yes
> └─> cpu_realtime_mask
> ╘═> (a mask of guest cores)
>
> That would result in a host that would use N+1 vCPUs, where N
> corresponds to the number of instance cores. Of the N cores, the set
> masked by 'cpu_realtime_mask' will be non-realtime. The remainder
> will be realtime.
>
> # The Problem(s)
>
> I'm going to thread this to capture the arguments and counter
> arguments:
>
> ## Problem 1
>
> henning.schild suggested that the current implementation of
> 'emulator_thread_policy' is too resource intensive, as the 1 core
> generally has a minimal workload for entire guests. This can
> significantly limit the number of guests that can be booted per host,
> particularly for guests with smaller numbers of cores. Instead, he
> has implemented a 'emulator_pin_set' host-level option, which
> complements 'vcpu_pin_set'. This allows us to "pool" emulator
> threads, similar to how vCPU threads behave with 'cpu_policy=shared'.
> He suggests this be adopted by nova.
>
> sahid seconded this, but suggests 'emulator_pin_set' be renamed
> 'cpu_emulator_threads_mask' and work as a mask of 'vcpu_pin_set'.
> He also suggested making a similarly-named flavor property, that
> would allow the user to use one of their cores for non-realtime
>
> henning.schild suggested a set would still be better, but that
> 'vpu_pin_set' be renamed to 'pin_set', as it would no longer be
> for only vCPUs
>
> cfriesen seconded henning.schild's position but was not entirely
> convinced that sharing emulator threads on a single pCPU is
> guaranteed to be safe, for example if one instance starts seriously
> hammering on I/O or does live migration or something. He suggested
> that an additional option, 'rt_emulator_overcommit_ratio' be added to
> make overcommitting explicit. In addition, he suggested making the
> flavor property a bitmask
>
> sahid questioned the need for an overcommit ratio, given that
> there is no overcommit of the hosts. An operator could synthesize a
> suitable value for 'emulator_pin_set'/'cpu_emulator_threads_mask'. He
> also disagreed with the suggestion that the flavor property be a
> bitmask as the only set is that of the vCPUs.
>
> cfriesen clarifies to point out how a few instances with
> many vCPUs will have more overhead requirements than many instances
> with few vCPUs. We need to be able to fail scheduling if the emulator
> thread cores are oversubscribed.
>
> ## Problem 2
>
> henning.schild suggests that hosts should be able to handle both RT
> and non-RT instances. This could be achieved through multiple
> instances of nova
>
> sahid points out that the recommendation is to use host aggregates
> to separate the two.
>
> henning.schild states that hosts with RT kernels can manage
> non-RT guests just fine. However, if using host aggregates is the
> recommendation then it should be possible to run multiple nova
> instances on a host, because dedicating an entire machine is not
> viable for smaller operations. cfriesen seconds this perspective,
> though not this solution.
>
> # Solutions
>
> Thus far, we've no clear conclusions on directions to go, so I've
> took a stab below. Henning, Sahid, Chris: does the above/below make
> sense, and is there anything we need to further clarify?
>
> # Problem 1
>
> From the above, there are 3-4 work items:
>
> - Add a 'emulator_pin_set' or 'cpu_emulator_threads_mask'
> configuration option
>
> - If using a mask, rename 'vcpu_pin_set' to 'pin_set' (or, better,
> 'usable_cpus')
>
> - Add a 'emulator_overcommit_ratio', which will do for emulator
> threads what the other ratios do for vCPUs and memory
>
> - Deprecate 'hw:emulator_thread_policy'???
>
> # Problem 2
>
> No clear conclusions yet?
>
> ---
>
> Cheers,
> Stephen
More information about the OpenStack-dev
mailing list