[openstack-dev] [nova] NUMA + SR-IOV

Czesnowicz, Przemyslaw przemyslaw.czesnowicz at intel.com
Thu Mar 24 16:54:33 UTC 2016

> -----Original Message-----
> From: Nikola Đipanov [mailto:ndipanov at redhat.com]
> Sent: Thursday, March 24, 2016 4:34 PM
> To: Sergey Nikitin <snikitin at mirantis.com>; OpenStack Development Mailing
> List (not for usage questions) <openstack-dev at lists.openstack.org>
> Cc: Czesnowicz, Przemyslaw <przemyslaw.czesnowicz at intel.com>
> Subject: Re: [openstack-dev] [nova] NUMA + SR-IOV
> On 03/24/2016 04:18 PM, Sergey Nikitin wrote:
> >
> > Hi, folks.
> >
> > I want to start a discussion about NUMA + SR-IOV environment. I have a
> > two-sockets server. It has two NUMA nodes and only one SR-IOV PCI
> > device. This device is associated with the first NUMA node. I booted a
> > set of VMs with SR-IOV support. Each of these VMs was booted on the
> > first NUMA node. As I understand it happened for better performance
> > (VM should be booted in NUMA node which has PCI device for this VM)
> [1].
> >
> > But this behavior leaves my 2-sockets machines half-populated. What if
> > I don't care about SR-IOV performance? I just want every VM from *any*
> > of NUMA nodes to use this single SR-IOV PCI device.
> >
> > But I can't do it because of behavior of numa_topology_filter. In this
> > filter we want to know if current host has required PCI device [2].
> > But we want to have this device *only* in some numa cell on this host.
> > It is hardcoded here [3]. If we do *not* pass variable "cells" to the
> > method
> > support_requests() [4] we will boot VM on the current host, if it has
> > required PCI device *on host* (maybe not in the same NUMA node).
> >
> > So my question is:
> > Is it correct that we *always* want to boot VM in NUMA node associated
> > with requested PCI device and user has no choice?
> > Or should we give a choice to the user and let him boot a VM with PCI
> > device, associated with another NUMA node?
> >

The rationale for choosing this behavior was that if you are requiring a NUMA topology for your VM 
and you request an SRIOV device as well then this is an high performance application and it should be configured appropriately.

Similarly if you request hugepages your VM will be confined to one NUMA (unless specified otherwise)
node and if there is no single NUMA node with enough resources it won't be created.

> This has come up before, and the fact that it keeps coming up tells me that
> we should probably do something about it.
> Potentially it makes sense to be lax by default unless user specifies that they
> want to make sure that the device is on the same NUMA node, but that is
> not backwards compatible.
> It does not make sense to ask user to specify that they don't care IMHO, as
> unless you know there is a problem (and users have nowhere near enough
> information to tell), there is no reason for you to specify it - it's just not
> sensible UI IMHO.

Yes this did come up few times, having a way to specify a requirement is probably a good idea.
If it would be done the way you propose that would change the behavior for existing users, not sure how big problem this is.


> My 0.02 cents.


> N.
> >
> > [1]
> > https://specs.openstack.org/openstack/nova-
> specs/specs/kilo/implemente
> > d/input-output-based-numa-scheduling.html
> > [2]
> >
> https://github.com/openstack/nova/blob/master/nova/scheduler/filters/n
> > uma_topology_filter.py#L85
> > [3]
> >
> https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L
> 1
> > 246-L1247 [4]
> > https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L277

More information about the OpenStack-dev mailing list