NUMATopologyFilter and AMD Epyc Rome

19 Nov 2020

      Hi all,

We’re running into an issue with deploying our infrastructure to run high throughput, low latency workloads.

Background:

We run Lenovo SR635 systems with an AMD Epyc 7502P processor. In the BIOS of this system, we are able to define the amount of NUMA cells per socket (called NPS). We can set 1, 2 or 4. As we run a 2x 100Gbit/s Mellanox CX5 in this system as well, we use the preferred-io setting in the BIOS to give preferred io throughput to the Mellanox CX5.
To make sure we get as high performance as possible, we set the NPS setting to 1, resulting in a single numa cell with 64 CPU threads available.

Next, in Nova (train distribution), we demand huge pages. Hugepages however, demands a NUMAtopology, but as this is one large NUMA cell, even with cpu=dedicated or requesting a single numa domain, we fail:

compute03, compute03 fails NUMA topology requirements. No host NUMA topology while the instance specified one. host_passes /usr/lib/python3/dist-packages/nova/scheduler/filters/numa_topology_filter.py:119

Any idea how to counter this? Setting NPS-2 will create two NUMA domains, but also cut our performance way down.

Thanks!

Regards,

Eyle

Eyle Brinkhuis

Stephen Finucane

Stephen Finucane

Eyle Brinkhuis

Sean Mooney

Sean Mooney

tags

participants (3)