<div dir="ltr">Guys, thank you for fast response. I'm glad that I'm not a single one who face this problem.</div><div class="gmail_extra"><br><div class="gmail_quote">2016-03-24 19:54 GMT+03:00 Czesnowicz, Przemyslaw <span dir="ltr"><<a href="mailto:przemyslaw.czesnowicz@intel.com" target="_blank">przemyslaw.czesnowicz@intel.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>

<br>

> -----Original Message-----<br>

> From: Nikola Đipanov [mailto:<a href="mailto:ndipanov@redhat.com">ndipanov@redhat.com</a>]<br>

> Sent: Thursday, March 24, 2016 4:34 PM<br>

> To: Sergey Nikitin <<a href="mailto:snikitin@mirantis.com">snikitin@mirantis.com</a>>; OpenStack Development Mailing<br>

> List (not for usage questions) <<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>><br>

> Cc: Czesnowicz, Przemyslaw <<a href="mailto:przemyslaw.czesnowicz@intel.com">przemyslaw.czesnowicz@intel.com</a>><br>

> Subject: Re: [openstack-dev] [nova] NUMA + SR-IOV<br>

><br>

> On 03/24/2016 04:18 PM, Sergey Nikitin wrote:<br>

> ><br>

> > Hi, folks.<br>

> ><br>

> > I want to start a discussion about NUMA + SR-IOV environment. I have a<br>

> > two-sockets server. It has two NUMA nodes and only one SR-IOV PCI<br>

> > device. This device is associated with the first NUMA node. I booted a<br>

> > set of VMs with SR-IOV support. Each of these VMs was booted on the<br>

> > first NUMA node. As I understand it happened for better performance<br>

> > (VM should be booted in NUMA node which has PCI device for this VM)<br>

> [1].<br>

> ><br>

> > But this behavior leaves my 2-sockets machines half-populated. What if<br>

> > I don't care about SR-IOV performance? I just want every VM from *any*<br>

> > of NUMA nodes to use this single SR-IOV PCI device.<br>

> ><br>

> > But I can't do it because of behavior of numa_topology_filter. In this<br>

> > filter we want to know if current host has required PCI device [2].<br>

> > But we want to have this device *only* in some numa cell on this host.<br>

> > It is hardcoded here [3]. If we do *not* pass variable "cells" to the<br>

> > method<br>

> > support_requests() [4] we will boot VM on the current host, if it has<br>

> > required PCI device *on host* (maybe not in the same NUMA node).<br>

> ><br>

> > So my question is:<br>

> > Is it correct that we *always* want to boot VM in NUMA node associated<br>

> > with requested PCI device and user has no choice?<br>

> > Or should we give a choice to the user and let him boot a VM with PCI<br>

> > device, associated with another NUMA node?<br>

> ><br>

<br>

</div></div>The rationale for choosing this behavior was that if you are requiring a NUMA topology for your VM<br>

and you request an SRIOV device as well then this is an high performance application and it should be configured appropriately.<br>

<br>

Similarly if you request hugepages your VM will be confined to one NUMA (unless specified otherwise)<br>

node and if there is no single NUMA node with enough resources it won't be created.<br>

<span class=""><br>

<br>

><br>

> This has come up before, and the fact that it keeps coming up tells me that<br>

> we should probably do something about it.<br>

><br>

> Potentially it makes sense to be lax by default unless user specifies that they<br>

> want to make sure that the device is on the same NUMA node, but that is<br>

> not backwards compatible.<br>

><br>

> It does not make sense to ask user to specify that they don't care IMHO, as<br>

> unless you know there is a problem (and users have nowhere near enough<br>

> information to tell), there is no reason for you to specify it - it's just not<br>

> sensible UI IMHO.<br>

><br>

<br>

</span>Yes this did come up few times, having a way to specify a requirement is probably a good idea.<br>

If it would be done the way you propose that would change the behavior for existing users, not sure how big problem this is.<br>

<br>

Przemek<br>

<div class="HOEnZb"><div class="h5"><br>

> My 0.02 cents.<br>

<br>

<br>

<br>

<br>

<br>

><br>

> N.<br>

><br>

> ><br>

> > [1]<br>

> > <a href="https://specs.openstack.org/openstack/nova-" rel="noreferrer" target="_blank">https://specs.openstack.org/openstack/nova-</a><br>

> specs/specs/kilo/implemente<br>

> > d/input-output-based-numa-scheduling.html<br>

> > [2]<br>

> ><br>

> <a href="https://github.com/openstack/nova/blob/master/nova/scheduler/filters/n" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/scheduler/filters/n</a><br>

> > uma_topology_filter.py#L85<br>

> > [3]<br>

> ><br>

> <a href="https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L</a><br>

> 1<br>

> > 246-L1247 [4]<br>

> > <a href="https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L277" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L277</a><br>

<br>

</div></div></blockquote></div><br></div>