[openstack-dev] realtime kvm cpu affinities

sfinucan at redhat.com sfinucan at redhat.com
Thu Jun 29 16:59:41 UTC 2017

On Tue, 2017-06-20 at 09:48 +0200, Henning Schild wrote:
> Hi,
> We are using OpenStack for managing realtime guests. We modified
> it and contributed to discussions on how to model the realtime
> feature. More recent versions of OpenStack have support for realtime,
> and there are a few proposals on how to improve that further.
> ...

I'd put off working my way through this thread until I'd time to sit down and
read it in full. Here's what I'm seeing by way of summaries _so far_.

# Current situation

I think this tree (sans 'hw' prefixes for brevity) represents the current
situation around flavor extra specs and image meta. Pretty much everything
hangs off cpu_policy=dedicated. Correct me if I'm wrong.

  ╞═> shared
  ╘═> dedicated
      ├─> cpu_thread_policy
      │   ╞═> prefer
      │   ╞═> isolate
      │   ╘═> require
      ├─> emulator_threads_policy (*)
      │   ╞═> share
      │   ╘═> isolate
      └─> cpu_realtime
          ╞═> no
          ╘═> yes
              └─> cpu_realtime_mask
                  ╘═> (a mask of guest cores)

(*) this one isn't configurable via images. I never really got why but meh.

There's also some host-level configuration options

  ╘═> (a list of host cores that nova can use)

Finally, there's some configuration you can do with your choice of kernel and
kernel options (e.g. 'isolcpus').

For real time workloads, the expectation would be that you would set:

  ╘═> dedicated
      ├─> cpu_thread_policy
      │   ╘═> isolate
      ├─> emulator_threads_policy
      │   ╘═> isolate
      └─> cpu_realtime
          ╘═> yes
              └─> cpu_realtime_mask
                  ╘═> (a mask of guest cores)

That would result in a host that would use N+1 vCPUs, where N corresponds to
the number of instance cores. Of the N cores, the set masked by
'cpu_realtime_mask' will be non-realtime. The remainder will be realtime.

# The Problem(s)

I'm going to thread this to capture the arguments and counter arguments:

## Problem 1

henning.schild suggested that the current implementation of
'emulator_thread_policy' is too resource intensive, as the 1 core generally
has a minimal workload for entire guests. This can significantly limit the
number of guests that can be booted per host, particularly for guests with
smaller numbers of cores. Instead, he has implemented a 'emulator_pin_set'
host-level option, which complements 'vcpu_pin_set'. This allows us to "pool"
emulator threads, similar to how vCPU threads behave with 'cpu_policy=shared'.
He suggests this be adopted by nova.

  sahid seconded this, but suggests 'emulator_pin_set' be renamed
  'cpu_emulator_threads_mask' and work as a mask of 'vcpu_pin_set'. He also
  suggested making a similarly-named flavor property, that would allow the
  user to use one of their cores for non-realtime 

    henning.schild suggested a set would still be better, but that
    'vpu_pin_set' be renamed to 'pin_set', as it would no longer be for only
      cfriesen seconded henning.schild's position but was not entirely
      convinced that sharing emulator threads on a single pCPU is guaranteed
      to be safe, for example if one instance starts seriously hammering on
      I/O or does live migration or something. He suggested that an additional
      option, 'rt_emulator_overcommit_ratio' be added to make overcommitting
      explicit. In addition, he suggested making the flavor property a bitmask

        sahid questioned the need for an overcommit ratio, given that there is
        no overcommit of the hosts. An operator could synthesize a suitable
        value for 'emulator_pin_set'/'cpu_emulator_threads_mask'. He also
        disagreed with the suggestion that the flavor property be a bitmask as
        the only set is that of the vCPUs.

          cfriesen clarifies to point out how a few instances with many vCPUs
          will have more overhead requirements than many instances with few
          vCPUs. We need to be able to fail scheduling if the emulator thread
          cores are oversubscribed.

## Problem 2

henning.schild suggests that hosts should be able to handle both RT and non-RT
instances. This could be achieved through multiple instances of nova

  sahid points out that the recommendation is to use host aggregates to
  separate the two.

    henning.schild states that hosts with RT kernels can manage non-RT guests
    just fine. However, if using host aggregates is the recommendation then it
    should be possible to run multiple nova instances on a host, because
    dedicating an entire machine is not viable for smaller operations. cfriesen
    seconds this perspective, though not this solution.

# Solutions

Thus far, we've no clear conclusions on directions to go, so I've took a stab
below. Henning, Sahid, Chris: does the above/below make sense, and is there
anything we need to further clarify?

# Problem 1

>From the above, there are 3-4 work items:

- Add a 'emulator_pin_set' or 'cpu_emulator_threads_mask' configuration option

  - If using a mask, rename 'vcpu_pin_set' to 'pin_set' (or, better,

- Add a 'emulator_overcommit_ratio', which will do for emulator threads what
  the other ratios do for vCPUs and memory

- Deprecate 'hw:emulator_thread_policy'???

# Problem 2

No clear conclusions yet?



More information about the OpenStack-dev mailing list