Re: [nova]Intel RDT CAT Support Missing from OpenStack

31 Jan 2025

      Dear Mooney,
...
before talking about an implementation i think it would be better to
reflect on what because your trying to enable
that migh inform a desing approch and the limitation of CAT/RDT may mean
that it will never fullfile your usecase.
What we want to enable by using RDT/MPAM and OpenStack to solve noisy neighbor problems are the following features.
1. Exclusive allocation of cache ways to each VM.
2. Exclusive allocation of memory bandwidth to each VM (to prevent contention).
3. Mechanism to launch VMs on the same compute node that do not utilize 1 and 2 (use shareable cache and m/b).

We set out below an idea of how these are implemented and some of our current concerns.
We would appreciate your feedback on whether this approach would be accepted by OpenStack community, considering the concerns listed below.
...
Regarding cache allocation, it is possible to allocate cache resources exclusively to each cgroup.
In resctrl, by setting the "mode" to "exclusive", we can assign cache resources to each cgroup exclusively, preventing other cgroups from accessing them.
However, this requires creating cgroups, assigning cache resources, and properly assigning processes to the cgroups via resctrl.
If one of your primary concerns is the requirement for dynamic hardware control within OpenStack to enable exclusive cache resource allocation, exclusive cache allocation without it is possible now, depending on the implementation.
The aforementioned "exclusive" mode only returns an error if the cache range resctrl attempts to allocate in exclusive mode is already assigned to another resource group. 
It does not automatically allocate exclusive cache resources to each process.

We think it's possible to assign caches exclusively to each vCPU through libvirt configuration.
Currently, libvirt allows specifying "cachetune" and "memorytune" elements. [1]

For cache allocation, we can specify a cache range to be assigned to vCPUs by using "cachetune" element in libvirt as follows:
  <cachetune vcpus='0-3'>
    <cache id='0' level='3' type='both' size='3' unit='MiB'/>
  </cachetune>
  <cachetune vcpus='4-5'>
    <cache id='1' level='3' type='both' size='3' unit='MiB'/>
  </cachetune>

It's our understanding that the caches assigned to each cache tune are mutually exclusive. 
In the example above, vCPUs 4-5 would not have access to cache ID 0 (and vice-versa). 
Therefore, we believe that using libvirt allows for the allocation of exclusive caches to VMs within OpenStack (No.1).

However, implementing a mechanism to mix VMs assigned exclusive caches with VMs assigned shareable caches (No.3) is considered difficult
because the libvirt API mentioned above provides a mechanism for assigning caches exclusively to specific vCPUs.

For memory bandwidth, we can specify the allocated bandwidth per vCPU range as a percentage (%) of the total memory bandwidth
by using "memorytune" element in libvirt as follows:

<memorytune vcpus='0-3'>
  <node id='0' bandwidth='60'/>
</memorytune>
<memorytune vcpus='4-5'>
  <node id='0' bandwidth='30'/>
</memorytune>

If the sum of allocated memory bandwidth exceeds 100%, contention may occur. 
Therefore, a mechanism within OpenStack is required to prevent the total from exceeding 100% to ensure exclusive memory bandwidth allocation (No. 2).

Other concerns are as follows.
1. allocating cache and memory bandwidth to a default resource group
It is a resource control group for which all processes except OpenStack VMs are assigned.
We need to configure it to avoid overlap with OpenStack VM allocations. 
Since libvirt API doesn't allow configuring default resource groups, 
manipulation of the resctrl API, outside of libvirt, will likely be necessary.

2. Resource usage efficiency for cpu, memory, cache ways, memory bandwidth
Exclusively allocating cache ways and memory bandwidth to VMs can lead to resource inefficiency. 
Cache ways and memory bandwidth may be exhausted before CPU and memory resources, 
resulting in unused CPU and memory while preventing the launch of further VMs.

We assume that modifying the specification of the libvirt API for RDT/MPAM is difficult 
as it requires agreement from vendors other than Intel RDT, such as Arm vendors.

Given these conditions, what are the chances of this implementation being accepted? 
We would appreciate feedback from Nova Core Developers and experienced developers.

[1] https://libvirt.org/formatdomain.html#cpu-tuning

Regards,
Taketani

Re: [nova]Intel RDT CAT Support Missing from OpenStack

taketani.ryo＠fujitsu.com