[openstack-dev] [nova] NUMA + SR-IOV

Nikola Đipanov ndipanov at redhat.com
Thu Mar 24 16:33:51 UTC 2016


On 03/24/2016 04:18 PM, Sergey Nikitin wrote:
> 
> Hi, folks.
> 
> I want to start a discussion about NUMA + SR-IOV environment. I have a
> two-sockets server. It has two NUMA nodes and only one SR-IOV PCI
> device. This device is associated with the first NUMA node. I booted a
> set of VMs with SR-IOV support. Each of these VMs was booted on the
> first NUMA node. As I understand it happened for better performance (VM
> should be booted in NUMA node which has PCI device for this VM) [1]. 
> 
> But this behavior leaves my 2-sockets machines half-populated. What if I
> don't care about SR-IOV performance? I just want every VM from *any* of
> NUMA nodes to use this single SR-IOV PCI device.
> 
> But I can't do it because of behavior of numa_topology_filter. In this
> filter we want to know if current host has required PCI device [2]. But
> we want to have this device *only* in some numa cell on this host. It is
> hardcoded here [3]. If we do *not* pass variable "cells" to the method
> support_requests() [4] we will boot VM on the current host, if it has
> required PCI device *on host* (maybe not in the same NUMA node). 
> 
> So my question is:
> Is it correct that we *always* want to boot VM in NUMA node associated
> with requested PCI device and user has no choice?
> Or should we give a choice to the user and let him boot a VM with PCI
> device, associated with another NUMA node?
> 

This has come up before, and the fact that it keeps coming up tells me
that we should probably do something about it.

Potentially it makes sense to be lax by default unless user specifies
that they want to make sure that the device is on the same NUMA node,
but that is not backwards compatible.

It does not make sense to ask user to specify that they don't care IMHO,
as unless you know there is a problem (and users have nowhere near
enough information to tell), there is no reason for you to specify it -
it's just not sensible UI IMHO.

My 0.02 cents.

N.

> 
> [1]
> https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/input-output-based-numa-scheduling.html
> [2]
> https://github.com/openstack/nova/blob/master/nova/scheduler/filters/numa_topology_filter.py#L85
> [3]
> https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L1246-L1247
> [4] https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L277




More information about the OpenStack-dev mailing list