[Openstack-operators] [neutron] [os-vif] VF overcommitting and performance in SR-IOV

Maciej Kucia maciej at kucia.net
Fri Jan 26 12:19:33 UTC 2018


Appreciate the feedback. It seems the conclusion is that generally one can
safety enable large number of VFs with an exception of some limited
hardware configurations which might require reducing VFs number due to BIOS
limitation.

Thanks & Regards,
Maciej

2018-01-23 3:39 GMT+01:00 Blair Bethwaite <blair.bethwaite at gmail.com>:

> This is starting to veer into magic territory for my level of
> understanding so beware... but I believe there are (or could be
> depending on your exact hardware) PCI config space considerations.
> IIUC each SRIOV VF will have its own PCI BAR. Depending on the window
> size required (which may be determined by other hardware features such
> as flow-steering), you can potentially hit compatibility issues with
> your server BIOS not supporting mapping of addresses which surpass
> 4GB. This can then result in the device hanging on initialisation (at
> server boot) and effectively bricking the box until the device is
> removed.
>
> We have seen this first hand on a Dell R730 with Mellanox ConnectX-4
> card (there are several other Dell 13G platforms with the same BIOS
> chipsets). We were explicitly increasing the PCI BAR size for the
> device (not upping the number of VFs) in relation to a memory
> exhaustion issue when running MPI collective communications on hosts
> with 28+ cores, we only had 16 (or maybe 32, I forget) VFs configured
> in the firmware.
>
> At the end of that support case (which resulted in a replacement NIC),
> the support engineer's summary included:
> """
> -When a BIOS limits the BAR to be contained in the 4GB address space -
> it is a BIOS limitation.
> Unfortunately, there is no way to tell - Some BIOS implementations use
> proprietary heuristics to decide when to map a specific BAR below 4GB.
>
> -When SR-IOV is enabled, and num-vfs is high, the corresponding VF BAR
> can be huge.
> In this case, the BIOS may exhaust the ~2GB address space that it has
> available below 4GB.
> In this case, the BIOS may hang – and the server won’t boot.
> """
>
> At the very least you should ask your hardware vendors some very
> specific questions before doing anything that might change your PCI
> BAR sizes.
>
> Cheers,
>
> On 23 January 2018 at 11:44, Pedro Sousa <pgsousa at gmail.com> wrote:
> > Hi,
> >
> > I have sr-iov in production in some customers with maximum number of VFs
> and
> > didn't notice any performance issues.
> >
> > My understanding is that of course you will have performance penalty if
> you
> > consume all those vfs, because you're dividing the bandwidth across them,
> > but other than if they're are there doing nothing you won't notice
> anything.
> >
> > But I'm just talking from my experience :)
> >
> > Regards,
> > Pedro Sousa
> >
> > On Mon, Jan 22, 2018 at 11:47 PM, Maciej Kucia <maciej at kucia.net> wrote:
> >>
> >> Thank you for the reply. I am interested in SR-IOV and pci whitelisting
> is
> >> certainly involved.
> >> I suspect that OpenStack itself can handle those numbers of devices,
> >> especially in telco applications where not much scheduling is being
> done.
> >> The feedback I am getting is from sysadmins who work on network
> >> virtualization but I think this is just a rumor without any proof.
> >>
> >> The question is if performance penalty from SR-IOV drivers or PCI itself
> >> is negligible. Should cloud admin configure maximum number of VFs for
> >> flexibility or should it be manually managed and balanced depending on
> >> application?
> >>
> >> Regards,
> >> Maciej
> >>
> >>>
> >>>
> >>> 2018-01-22 18:38 GMT+01:00 Jay Pipes <jaypipes at gmail.com>:
> >>>>
> >>>> On 01/22/2018 11:36 AM, Maciej Kucia wrote:
> >>>>>
> >>>>> Hi!
> >>>>>
> >>>>> Is there any noticeable performance penalty when using multiple
> virtual
> >>>>> functions?
> >>>>>
> >>>>> For simplicity I am enabling all available virtual functions in my
> >>>>> NICs.
> >>>>
> >>>>
> >>>> I presume by the above you are referring to setting your
> >>>> pci_passthrough_whitelist on your compute nodes to whitelist all VFs
> on a
> >>>> particular PF's PCI address domain/bus?
> >>>>
> >>>>> Sometimes application is using only few of them. I am using Intel and
> >>>>> Mellanox.
> >>>>>
> >>>>> I do not see any performance drop but I am getting feedback that this
> >>>>> might not be the best approach.
> >>>>
> >>>>
> >>>> Who is giving you this feedback?
> >>>>
> >>>> The only issue with enabling (potentially 254 or more) VFs for each PF
> >>>> is that each VF will end up as a record in the pci_devices table in
> the Nova
> >>>> cell database. Multiply 254 or more times the number of PFs times the
> number
> >>>> of compute nodes in your deployment and you can get a large number of
> >>>> records that need to be stored. That said, the pci_devices table is
> well
> >>>> indexed and even if you had 1M or more records in the table, the
> access of a
> >>>> few hundred of those records when the resource tracker does a
> >>>> PciDeviceList.get_by_compute_node() [1] will still be quite fast.
> >>>>
> >>>> Best,
> >>>> -jay
> >>>>
> >>>> [1]
> >>>> https://github.com/openstack/nova/blob/stable/pike/nova/
> compute/resource_tracker.py#L572
> >>>> and then
> >>>>
> >>>> https://github.com/openstack/nova/blob/stable/pike/nova/
> pci/manager.py#L71
> >>>>
> >>>>> Any recommendations?
> >>>>>
> >>>>> Thanks,
> >>>>> Maciej
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> OpenStack-operators mailing list
> >>>>> OpenStack-operators at lists.openstack.org
> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack-operators
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> OpenStack-operators mailing list
> >>>> OpenStack-operators at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack-operators
> >>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> OpenStack-operators mailing list
> >> OpenStack-operators at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >>
> >
> >
> > _______________________________________________
> > OpenStack-operators mailing list
> > OpenStack-operators at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >
>
>
>
> --
> Cheers,
> ~Blairo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20180126/a3b1b489/attachment.html>


More information about the OpenStack-operators mailing list