NUMATopologyFilter and AMD Epyc Rome
stephenfin at redhat.com
Thu Nov 19 12:31:28 UTC 2020
On Thu, 2020-11-19 at 12:25 +0000, Stephen Finucane wrote:
> On Thu, 2020-11-19 at 12:00 +0000, Eyle Brinkhuis wrote:
> > Hi all,
> > We’re running into an issue with deploying our infrastructure to
> > run high throughput, low latency workloads.
> > Background:
> > We run Lenovo SR635 systems with an AMD Epyc 7502P processor. In
> > the BIOS of this system, we are able to define the amount of NUMA
> > cells per socket (called NPS). We can set 1, 2 or 4. As we run a 2x
> > 100Gbit/s Mellanox CX5 in this system as well, we use the
> > preferred-io setting in the BIOS to give preferred io throughput to
> > the Mellanox CX5.
> > To make sure we get as high performance as possible, we set the NPS
> > setting to 1, resulting in a single numa cell with 64 CPU threads
> > available.
> > Next, in Nova (train distribution), we demand huge pages. Hugepages
> > however, demands a NUMAtopology, but as this is one large NUMA
> > cell, even with cpu=dedicated or requesting a single numa domain,
> > we fail:
> > compute03, compute03 fails NUMA topology requirements. No host NUMA
> > topology while the instance specified one. host_passes
> > /usr/lib/python3/dist-
> > packages/nova/scheduler/filters/numa_topology_filter.py:119
> Oh, this is interesting. This would suggest that when NPS is
> configured to 1, the host is presented as a UMA system and libvirt
> doesn't present topology information for us to parse. That seems odd
> and goes against how I though newer versions of libvirt worked.
> What do you see for when you run e.g.:
> $ virsh capabilities | xmllint --xpath
> '/capabilities/host/topology' -
Also, what version of libvirt are you using? Past investigations 
led me to believe that libvirt would now always present a NUMA topology
for hosts, even if those hosts were in fact UMA.
> > Any idea how to counter this? Setting NPS-2 will create two NUMA
> > domains, but also cut our performance way down.
> It's worth noting that by setting NP1 to 1, you're already cutting
> your performance. This makes it look like you've got a single NUMA
> node but of course, that doesn't change the physical design of the
> chip and there are still multiple memory controllers, some of which
> will be slower to access to from certain cores. You're simply mixing
> best and worst case performance to provide an average. You said you
> have two SR-IOV NICs. I assume you're bonding these NICs? If not, you
> could set NPS to 2 and then ensure the NICs are in PCI slots that
> correspond to different NUMA nodes. You can validate this
> configuration using tools like 'lstopo' and 'numactl'.
> > Thanks!
> > Regards,
> > Eyle
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openstack-discuss