[openstack-dev] [nova] NUMA + SR-IOV

Sergey Nikitin snikitin at mirantis.com
Fri Mar 25 06:49:01 UTC 2016


Guys, thank you for fast response. I'm glad that I'm not a single one who
face this problem.

2016-03-24 19:54 GMT+03:00 Czesnowicz, Przemyslaw <
przemyslaw.czesnowicz at intel.com>:

>
>
> > -----Original Message-----
> > From: Nikola Đipanov [mailto:ndipanov at redhat.com]
> > Sent: Thursday, March 24, 2016 4:34 PM
> > To: Sergey Nikitin <snikitin at mirantis.com>; OpenStack Development
> Mailing
> > List (not for usage questions) <openstack-dev at lists.openstack.org>
> > Cc: Czesnowicz, Przemyslaw <przemyslaw.czesnowicz at intel.com>
> > Subject: Re: [openstack-dev] [nova] NUMA + SR-IOV
> >
> > On 03/24/2016 04:18 PM, Sergey Nikitin wrote:
> > >
> > > Hi, folks.
> > >
> > > I want to start a discussion about NUMA + SR-IOV environment. I have a
> > > two-sockets server. It has two NUMA nodes and only one SR-IOV PCI
> > > device. This device is associated with the first NUMA node. I booted a
> > > set of VMs with SR-IOV support. Each of these VMs was booted on the
> > > first NUMA node. As I understand it happened for better performance
> > > (VM should be booted in NUMA node which has PCI device for this VM)
> > [1].
> > >
> > > But this behavior leaves my 2-sockets machines half-populated. What if
> > > I don't care about SR-IOV performance? I just want every VM from *any*
> > > of NUMA nodes to use this single SR-IOV PCI device.
> > >
> > > But I can't do it because of behavior of numa_topology_filter. In this
> > > filter we want to know if current host has required PCI device [2].
> > > But we want to have this device *only* in some numa cell on this host.
> > > It is hardcoded here [3]. If we do *not* pass variable "cells" to the
> > > method
> > > support_requests() [4] we will boot VM on the current host, if it has
> > > required PCI device *on host* (maybe not in the same NUMA node).
> > >
> > > So my question is:
> > > Is it correct that we *always* want to boot VM in NUMA node associated
> > > with requested PCI device and user has no choice?
> > > Or should we give a choice to the user and let him boot a VM with PCI
> > > device, associated with another NUMA node?
> > >
>
> The rationale for choosing this behavior was that if you are requiring a
> NUMA topology for your VM
> and you request an SRIOV device as well then this is an high performance
> application and it should be configured appropriately.
>
> Similarly if you request hugepages your VM will be confined to one NUMA
> (unless specified otherwise)
> node and if there is no single NUMA node with enough resources it won't be
> created.
>
>
> >
> > This has come up before, and the fact that it keeps coming up tells me
> that
> > we should probably do something about it.
> >
> > Potentially it makes sense to be lax by default unless user specifies
> that they
> > want to make sure that the device is on the same NUMA node, but that is
> > not backwards compatible.
> >
> > It does not make sense to ask user to specify that they don't care IMHO,
> as
> > unless you know there is a problem (and users have nowhere near enough
> > information to tell), there is no reason for you to specify it - it's
> just not
> > sensible UI IMHO.
> >
>
> Yes this did come up few times, having a way to specify a requirement is
> probably a good idea.
> If it would be done the way you propose that would change the behavior for
> existing users, not sure how big problem this is.
>
> Przemek
>
> > My 0.02 cents.
>
>
>
>
>
> >
> > N.
> >
> > >
> > > [1]
> > > https://specs.openstack.org/openstack/nova-
> > specs/specs/kilo/implemente
> > > d/input-output-based-numa-scheduling.html
> > > [2]
> > >
> > https://github.com/openstack/nova/blob/master/nova/scheduler/filters/n
> > > uma_topology_filter.py#L85
> > > [3]
> > >
> > https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L
> > 1
> > > 246-L1247 [4]
> > > https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L277
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160325/1857908b/attachment.html>


More information about the OpenStack-dev mailing list