[nova]Intel RDT CAT Support Missing from OpenStack

Junya Noguchi (Fujitsu)

13 Dec 2024 13 Dec '24

8:38 a.m.

To whom it may concern I have a question regarding the lack of Intel RDT CAT implementation in OpenStack We are planning to commit Resource Monitoring functionality to Openstack for ARM, and are exploring the implementation of Intel RDT on Openstack. We found the following blueprint for the Intel RDT CAT implementation in nova. https://blueprints.launchpad.net/nova/+spec/cat-support However, the Intel RDT CAT feature is not currently implemented in OpenStack. Has the implementation of Intel RDT CAT functionality in Openstack been rejected? Or is it just a development delay? Best regards, Junya Noguchi.

Show replies by date

Jeremy Stanley

13 Dec 13 Dec

8:50 p.m.

[I'm keeping the original sender in Cc as they're not subscribed.] On 2024-12-13 05:08:13 +0000 (+0000), Junya Noguchi (Fujitsu) wrote: [...]

...

Has the implementation of Intel RDT CAT functionality in Openstack been rejected? Or is it just a development delay? [...]

Following the trail of abandoned/superseded changes, specs, blueprints, PTG notes and mailing list discussions, it looks like the decision was to keep it as decoupled from Nova as possible but enable use through a more generalized provider configuration. The last of the patches for that functionality[**] merged a little over 4 years ago. Hope that helps! [*] https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.... [**] https://review.opendev.org/q/topic:bp/provider-config-file -- Jeremy Stanley

Junya Noguchi (Fujitsu)

18 Dec 18 Dec

11:51 a.m.

Dear Mr. Stanley Thank you for your response. I understand that I can define custom resources using Placement and Provider Config[1][2]. I've also confirmed that Intel RMD (Resource Manager Daemon) is mentioned as a use case, but it seems there's no RMD integration for OpenStack[3]. I have a further question. When defining custom resources using Placement and Provider Config, does the service responsible for resource allocation and update of allocations to instances need to be implemented independently by the creator of the custom resource? Alternatively, does a mechanism exist within OpenStack components (e.g., Nova) to support this? For example, is there a function that calls a custom hook function when creating a domain in nova-compute? I'm struggling to decide on the implementation approach. Should the resource allocation process be handled as a plugin (separable from the OpenStack codebase), or should I modify the libvirt driver to add the processing for allocating LLC and memory bandwidth? [1] https://docs.openstack.org/nova/latest/admin/managing-resource-providers.htm... [2] https://docs.openstack.org/placement/latest/ [3] https://networkbuilders.intel.com/docs/networkbuilders/resource-management-d... 3.1 Best regards, Junya Noguchi.

Takashi Kajinami

23 Dec 23 Dec

8:59 p.m.

...

I have a further question. When defining custom resources using Placement and Provider Config, does the service responsible for resource allocation and update of allocations to instances need to be implemented independently by the creator of the custom resource?

Provider config is the mechanism to allow you to only inject additional resource providers/traits/etc to compute nodes, so you still need an external machanism to add the relevant items to that config file. There was a spec to add mechanism to make nova delegate management of subsect of host resources to RMD[1], but this was abandoned due to the provider config and preference to avoid implementing the built-in logic to pull resources from the RMD which is quite specific to Intel's architecture.

...

Alternatively, does a mechanism exist within OpenStack components (e.g., Nova) to support this? AFAIK, no, and a complete external mechanism may be needed.

...

I'm struggling to decide on the implementation approach. Should the resource allocation process be handled as a plugin (separable from the OpenStack codebase), or should I modify the libvirt driver to add the processing for allocating LLC and memory bandwidth?

If the available resources can be obtained by a more generic way (like libvirt api or kernel api, without relying on ARM specific software) then it'd be acceptable to implement the logic to detect available resources and update resource providers within libvirt driver. [1] https://blueprints.launchpad.net/nova/+spec/rmd-base-enablement In addition there is no mechanism to change domain xml according to custom resources atm, so you may have to implement the feature within nova (or create a modified version of libvirt driver which sounds non-ideal). On 12/18/24 5:21 PM, Junya Noguchi (Fujitsu) wrote:

...

Dear Mr. Stanley

Thank you for your response.

I understand that I can define custom resources using Placement and Provider Config[1][2]. I've also confirmed that Intel RMD (Resource Manager Daemon) is mentioned as a use case, but it seems there's no RMD integration for OpenStack[3].

I have a further question. When defining custom resources using Placement and Provider Config, does the service responsible for resource allocation and update of allocations to instances need to be implemented independently by the creator of the custom resource? Alternatively, does a mechanism exist within OpenStack components (e.g., Nova) to support this? For example, is there a function that calls a custom hook function when creating a domain in nova-compute?

I'm struggling to decide on the implementation approach. Should the resource allocation process be handled as a plugin (separable from the OpenStack codebase), or should I modify the libvirt driver to add the processing for allocating LLC and memory bandwidth?

[1] https://docs.openstack.org/nova/latest/admin/managing-resource-providers.htm... [2] https://docs.openstack.org/placement/latest/ [3] https://networkbuilders.intel.com/docs/networkbuilders/resource-management-d... 3.1

Best regards, Junya Noguchi.

Junya Noguchi (Fujitsu)

26 Dec 26 Dec

1:38 p.m.

Dear Mr. Kajinami Thank you for your answer. I understand that the change may be acceptable if we use a more generic way to get resources. Best regards, Junya Noguchi.

Sean Mooney

5 Jan 5 Jan

8:42 a.m.

On 18/12/2024 08:21, Junya Noguchi (Fujitsu) wrote:

...

Dear Mr. Stanley

Thank you for your response.

I understand that I can define custom resources using Placement and Provider Config[1][2]. I've also confirmed that Intel RMD (Resource Manager Daemon) is mentioned as a use case, but it seems there's no RMD integration for OpenStack[3].

I have a further question. When defining custom resources using Placement and Provider Config, does the service responsible for resource allocation and update of allocations to instances need to be implemented independently by the creator of the custom resource? Alternatively, does a mechanism exist within OpenStack components (e.g., Nova) to support this? For example, is there a function that calls a custom hook function when creating a domain in nova-compute?

I'm struggling to decide on the implementation approach. Should the resource allocation process be handled as a plugin (separable from the OpenStack codebase), or should I modify the libvirt driver to add the processing for allocating LLC and memory bandwidth?

nova does not allow any external process to modify the xml of an instance. so eitehr the cache allocation needs to be done entirly externally of nova without modifying the xml via an external agent via something like the kernel resoucectl virtual file system or nova need to be modifed to staticaly configure it vai the xml. we do not supprot any kind of hook point or plugin workflow to modify the domain xml. We also do not currently plan to exted the provider config feature to allow you register a dynamic handler for the custom resouce. ultimately im not convinced we should add CAT/RDT type functionality in the near term. The RDT technology is really not well designed. intel CAT (cache allocation technology) shoudl be called CCT (Cache Confinmenet Technology) but that does not have the same marketing appeal. instead of allcoatiog cache ways to processes and the hardware preventing other proicess form using that cache you are actually confining a process to a set of cache ways prevetnign it form using other cacheways. other processes are free to use the cache ways you configed the process too as well. there is no isolation at the hardware level so you can still get noisy neighbour effect form unconfied proceses. that mean you need to first reserve cache ways for exclusive use by openstack (preventing all host proces form usign them) and then config individual instance to the now free cache ways. the same applies to intels memomory bandwith allcoation feature. At least that was the case 5-6 years ago when i last worked on this. That makes this feature quite unsuited for a cloud envionment and very hared to integrated Into nova static resouce allocation architure. That why we proposed two versions originally, 1.) a simple static allcoation modle via libvirt. This ultimately did not really deliver enough value to warrant complexity. 2 external dynamic agent like RMD. ultimatly RMD was a slignly nicer approch even if it was very racy in that the RMD demaon could both mange the cache asignement of the background system processes and the vm but then you end up with nova/libvirt and RMD seperatlly trying to optimise the placemtn of the vm without knowwign about each other. im not saying that there is no way to supprot this in nova but it would be very difficult to do so. likely its more then one cycles works. before talking about an implementation i think it would be better to reflect on what because your trying to enable that migh inform a desing approch and the limitation of CAT/RDT may mean that it will never fullfile your usecase.

...

[1] https://docs.openstack.org/nova/latest/admin/managing-resource-providers.htm... [2] https://docs.openstack.org/placement/latest/ [3] https://networkbuilders.intel.com/docs/networkbuilders/resource-management-d... 3.1

Best regards, Junya Noguchi.

Junya Noguchi (Fujitsu)

8 Jan 8 Jan

3:28 a.m.

Dear Mr. Mooney Thank you for your answer. We will reconsider what kind of functions are needed to deal with Noisy Neighbor. Best regards, Junya Noguchi.

taketani.ryo＠fujitsu.com

20 Jan 20 Jan

1:16 p.m.

Dear Mooney Our understanding of the CAT/MBA specification is as follows. Could you please verify it?

...

instead of allcoatiog cache ways to processes and the hardware preventing other proicess form using that cache you are actually confining a process to a set of cache ways prevetnign it form using other cacheways. other processes are free to use the cache ways you configed the process too as well. there is no isolation at the hardware level so you can still get noisy neighbour effect form unconfied proceses.

that mean you need to first reserve cache ways for exclusive use by openstack (preventing all host proces form usign them) and then config individual instance to the now free cache ways.

Regarding cache allocation, it is possible to allocate cache resources exclusively to each cgroup. In resctrl, by setting the "mode" to "exclusive", we can assign cache resources to each cgroup exclusively, preventing other cgroups from accessing them.[1] However, this requires creating cgroups, assigning cache resources, and properly assigning processes to the cgroups via resctrl. If one of your primary concerns is the requirement for dynamic hardware control within OpenStack to enable exclusive cache resource allocation, exclusive cache allocation without it is possible now, depending on the implementation.

...

the same applies to intels memomory bandwith allcoation feature.

Each cgroup can be assigned a maximum bandwidth, expressed as a percentage of the total bandwidth or in MBps [1]. If the total assigned bandwidth exceeds 100%, contention may occur. We can exclusively allocate bandwidth by implementing a mechanism that keeps total assigned bandwidth below 100%. [1] https://www.kernel.org/doc/Documentation/x86/resctrl.rst Please let me know if you have any concerns or comments based on the information above. regards, Taketani

taketani.ryo＠fujitsu.com

31 Jan 31 Jan

12:22 p.m.

Dear Mooney,

...

before talking about an implementation i think it would be better to reflect on what because your trying to enable that migh inform a desing approch and the limitation of CAT/RDT may mean that it will never fullfile your usecase.

What we want to enable by using RDT/MPAM and OpenStack to solve noisy neighbor problems are the following features. 1. Exclusive allocation of cache ways to each VM. 2. Exclusive allocation of memory bandwidth to each VM (to prevent contention). 3. Mechanism to launch VMs on the same compute node that do not utilize 1 and 2 (use shareable cache and m/b). We set out below an idea of how these are implemented and some of our current concerns. We would appreciate your feedback on whether this approach would be accepted by OpenStack community, considering the concerns listed below.

...

Regarding cache allocation, it is possible to allocate cache resources exclusively to each cgroup. In resctrl, by setting the "mode" to "exclusive", we can assign cache resources to each cgroup exclusively, preventing other cgroups from accessing them. However, this requires creating cgroups, assigning cache resources, and properly assigning processes to the cgroups via resctrl.

If one of your primary concerns is the requirement for dynamic hardware control within OpenStack to enable exclusive cache resource allocation, exclusive cache allocation without it is possible now, depending on the implementation.

The aforementioned "exclusive" mode only returns an error if the cache range resctrl attempts to allocate in exclusive mode is already assigned to another resource group. It does not automatically allocate exclusive cache resources to each process. We think it's possible to assign caches exclusively to each vCPU through libvirt configuration. Currently, libvirt allows specifying "cachetune" and "memorytune" elements. [1] For cache allocation, we can specify a cache range to be assigned to vCPUs by using "cachetune" element in libvirt as follows: <cachetune vcpus='0-3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> </cachetune> <cachetune vcpus='4-5'> <cache id='1' level='3' type='both' size='3' unit='MiB'/> </cachetune> It's our understanding that the caches assigned to each cache tune are mutually exclusive. In the example above, vCPUs 4-5 would not have access to cache ID 0 (and vice-versa). Therefore, we believe that using libvirt allows for the allocation of exclusive caches to VMs within OpenStack (No.1). However, implementing a mechanism to mix VMs assigned exclusive caches with VMs assigned shareable caches (No.3) is considered difficult because the libvirt API mentioned above provides a mechanism for assigning caches exclusively to specific vCPUs. For memory bandwidth, we can specify the allocated bandwidth per vCPU range as a percentage (%) of the total memory bandwidth by using "memorytune" element in libvirt as follows: <memorytune vcpus='0-3'> <node id='0' bandwidth='60'/> </memorytune> <memorytune vcpus='4-5'> <node id='0' bandwidth='30'/> </memorytune> If the sum of allocated memory bandwidth exceeds 100%, contention may occur. Therefore, a mechanism within OpenStack is required to prevent the total from exceeding 100% to ensure exclusive memory bandwidth allocation (No. 2). Other concerns are as follows. 1. allocating cache and memory bandwidth to a default resource group It is a resource control group for which all processes except OpenStack VMs are assigned. We need to configure it to avoid overlap with OpenStack VM allocations. Since libvirt API doesn't allow configuring default resource groups, manipulation of the resctrl API, outside of libvirt, will likely be necessary. 2. Resource usage efficiency for cpu, memory, cache ways, memory bandwidth Exclusively allocating cache ways and memory bandwidth to VMs can lead to resource inefficiency. Cache ways and memory bandwidth may be exhausted before CPU and memory resources, resulting in unused CPU and memory while preventing the launch of further VMs. We assume that modifying the specification of the libvirt API for RDT/MPAM is difficult as it requires agreement from vendors other than Intel RDT, such as Arm vendors. Given these conditions, what are the chances of this implementation being accepted? We would appreciate feedback from Nova Core Developers and experienced developers. [1] https://libvirt.org/formatdomain.html#cpu-tuning Regards, Taketani

taketani.ryo＠fujitsu.com

25 Feb 25 Feb

6:17 a.m.

Dear Mooney I've read the libvirt pqos feature spec[1] that you shared during the weekly meeting. I want to understand correctly why the development on this feature is currently stalled, and I would appreciate community feedbacks on the best approach for supporting RDT/MPAM. 1. My understanding of the libvirt pqos feature proposal The current libvirt pqos specification[1] is not designed that CPUs can be assigned to specific cache banks. To resolve an above problem, improving the libvirt pqos spec will be restarted if the feature of NUMA topology with resource providers[2] is complete. However, significant refactoring in placement is currently constrained because placement has scalability problems[3]. Nova community is actively addressing them. Is my understanding correct? Please point out any misunderstandings. Any comments or clarifications would also be appreciated. 2. Which solution for supporting RDT/MPAM feature is the best to communities I think the optimal solution involves resolving scalability problems[3], accelerating development of NUMA topology with RPs[2] and the libvirt pqos feature[1]. However, I assume this will take considerable time(5-6cycles?). Therefore, I would like to confirm whether proposing a new design without resource providers (e.g., developing NUMA topology and pqos feature within nova[4]) would be an acceptable alternative. [1] https://review.opendev.org/c/openstack/nova-specs/+/662264/1 [2] https://review.opendev.org/c/openstack/nova-specs/+/728009 [3] https://review.opendev.org/c/openstack/nova-specs/+/938070 [4] https://review.opendev.org/q/topic:%22bp/libvirt-smarter-cpu-placement%22 Regards, Taketani

165

Age (days ago)

239

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Jeremy Stanley
Junya Noguchi (Fujitsu)
Sean Mooney
Takashi Kajinami
taketani.ryo＠fujitsu.com

[nova]Intel RDT CAT Support Missing from OpenStack

tags

participants (5)