[openstack-dev] [nova] NUMA + SR-IOV
Sergey Nikitin
snikitin at mirantis.com
Fri Mar 25 11:13:37 UTC 2016
FYI: I created blueprint
https://blueprints.launchpad.net/nova/+spec/share-pci-device-between-numa-nodes
and want to discuss it in Austin.
2016-03-25 9:49 GMT+03:00 Sergey Nikitin <snikitin at mirantis.com>:
> Guys, thank you for fast response. I'm glad that I'm not a single one who
> face this problem.
>
> 2016-03-24 19:54 GMT+03:00 Czesnowicz, Przemyslaw <
> przemyslaw.czesnowicz at intel.com>:
>
>>
>>
>> > -----Original Message-----
>> > From: Nikola Đipanov [mailto:ndipanov at redhat.com]
>> > Sent: Thursday, March 24, 2016 4:34 PM
>> > To: Sergey Nikitin <snikitin at mirantis.com>; OpenStack Development
>> Mailing
>> > List (not for usage questions) <openstack-dev at lists.openstack.org>
>> > Cc: Czesnowicz, Przemyslaw <przemyslaw.czesnowicz at intel.com>
>> > Subject: Re: [openstack-dev] [nova] NUMA + SR-IOV
>> >
>> > On 03/24/2016 04:18 PM, Sergey Nikitin wrote:
>> > >
>> > > Hi, folks.
>> > >
>> > > I want to start a discussion about NUMA + SR-IOV environment. I have a
>> > > two-sockets server. It has two NUMA nodes and only one SR-IOV PCI
>> > > device. This device is associated with the first NUMA node. I booted a
>> > > set of VMs with SR-IOV support. Each of these VMs was booted on the
>> > > first NUMA node. As I understand it happened for better performance
>> > > (VM should be booted in NUMA node which has PCI device for this VM)
>> > [1].
>> > >
>> > > But this behavior leaves my 2-sockets machines half-populated. What if
>> > > I don't care about SR-IOV performance? I just want every VM from *any*
>> > > of NUMA nodes to use this single SR-IOV PCI device.
>> > >
>> > > But I can't do it because of behavior of numa_topology_filter. In this
>> > > filter we want to know if current host has required PCI device [2].
>> > > But we want to have this device *only* in some numa cell on this host.
>> > > It is hardcoded here [3]. If we do *not* pass variable "cells" to the
>> > > method
>> > > support_requests() [4] we will boot VM on the current host, if it has
>> > > required PCI device *on host* (maybe not in the same NUMA node).
>> > >
>> > > So my question is:
>> > > Is it correct that we *always* want to boot VM in NUMA node associated
>> > > with requested PCI device and user has no choice?
>> > > Or should we give a choice to the user and let him boot a VM with PCI
>> > > device, associated with another NUMA node?
>> > >
>>
>> The rationale for choosing this behavior was that if you are requiring a
>> NUMA topology for your VM
>> and you request an SRIOV device as well then this is an high performance
>> application and it should be configured appropriately.
>>
>> Similarly if you request hugepages your VM will be confined to one NUMA
>> (unless specified otherwise)
>> node and if there is no single NUMA node with enough resources it won't
>> be created.
>>
>>
>> >
>> > This has come up before, and the fact that it keeps coming up tells me
>> that
>> > we should probably do something about it.
>> >
>> > Potentially it makes sense to be lax by default unless user specifies
>> that they
>> > want to make sure that the device is on the same NUMA node, but that is
>> > not backwards compatible.
>> >
>> > It does not make sense to ask user to specify that they don't care
>> IMHO, as
>> > unless you know there is a problem (and users have nowhere near enough
>> > information to tell), there is no reason for you to specify it - it's
>> just not
>> > sensible UI IMHO.
>> >
>>
>> Yes this did come up few times, having a way to specify a requirement is
>> probably a good idea.
>> If it would be done the way you propose that would change the behavior
>> for existing users, not sure how big problem this is.
>>
>> Przemek
>>
>> > My 0.02 cents.
>>
>>
>>
>>
>>
>> >
>> > N.
>> >
>> > >
>> > > [1]
>> > > https://specs.openstack.org/openstack/nova-
>> > specs/specs/kilo/implemente
>> > > d/input-output-based-numa-scheduling.html
>> > > [2]
>> > >
>> > https://github.com/openstack/nova/blob/master/nova/scheduler/filters/n
>> > > uma_topology_filter.py#L85
>> > > [3]
>> > >
>> > https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L
>> > 1
>> > > 246-L1247 [4]
>> > > https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L277
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160325/b60ad8f2/attachment.html>
More information about the OpenStack-dev
mailing list