On 03/09/2025 16:40, Andrew Bonney wrote:
We have been using routed provider networks with SRIOV for a while, but I believe this works a little 'by accident', so I'm not suggesting this is a recommended path. I'll describe as best I can but it's a while since I've worked through this so I may not be 100% on why it works.
Like your description, we have a physnet per rack, with a segment and associated VLAN for each. A user can create a 'Direct' type port associated with the segmented network, and with deferred IP allocation so it doesn't tie the port to a rack.
If I'm remembering correctly the accidental 'hack' here is that ALL of the hypervisors in all racks have a PCI device spec listing ALL of the physical networks, and they all use the same interface name, as follows. Only the Neutron SRIOV config differs dependent on the rack the host is in.
[pci] # PCI devices available to VMs device_spec = { "physical_network": "physnet_media_a1", "devname": "enp129s0f0np0" } device_spec = { "physical_network": "physnet_media_a2", "devname": "enp129s0f0np0" } device_spec = { "physical_network": "physnet_media_a3", "devname": "enp129s0f0np0" }
so ^ is not supported form a nova perspective. you are not meant to be able to list the same device multiple time and merge the phsynets to form effectively a list like that. it would be entirely supprotred if each line referenced a different device. this should actually be a startup config error in nova. on the neutron side phsynets must provide l2 isolation or it will break multi tenantcy i.e. vlan 100 on physnet_media_a1 __MUST__ not ever allow communication to vlan 100 on physnet_media_a2 without the two phsynets being interconnect by an l3 router so this is also invalid config on the neutron side as you do not meet the requirements for defining separate phsynets it will break how multi tenancy is designed to work on ther side. This can work in a private cloud and it can work fi you do not allow vlan/flat tenant network in neutron but you as the admin take on the burden of makeing sure that you do not violate multi tenancy in this case isntead of neutron doing it. this is a variation of the hack that telco do for numa local networks before that was actully possibel in nova. i.e. they woudl have phsynet_1_numa_0 and phsynet_1_numa_1 and use that to force pci devices to come form the relevent numa node but in reality they would nto be sperate pnsyical network so they would create a multi provider network using the same valan on both to interconnect them. this is well into unsupproted land but if you deeply understand the secuirty implciations and that is accptbel for you private cloud you could do htis. if it break however for any reason its not an upstream bug as you are deliberatly misconfiguring nova and neutorn to make this work.
From Nova's perspective it is always selecting the first physical network in the list, but this means that when the scheduler picks a hypervisor in a rack other than '1', it will still be happy to proceed.
------------------------------------------------------------------------ *From:* Sean Mooney <smooney@redhat.com> *Sent:* Wednesday, September 03, 2025 16:08 *To:* openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>; nathanh@graphcore.ai <nathanh@graphcore.ai> *Cc:* bens@graphcore.ai <bens@graphcore.ai> *Subject:* Re: [ops][nova][neutron] Routed provider networking and physnet port scheduling
External: Think before clicking
just adding bens back incase they are not on the list.
i slectecte the wrogn reply type before
On 03/09/2025 16:07, Sean Mooney wrote:
On 03/09/2025 15:10, Nathan Harper wrote:
Hi all,
We have been looking at building some routed provider networks, following this documentation:
https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html <https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html>
In this scenario we have 4 racks, and have defined physnets for each rack and assigned SRIOV interfaces for each. We then have created a multisegment network, with a segment associated with each physnet. We get the expected resource provider in placement containing only these hypervisors.
When scheduling instances onto this network, the allocation candidates are any hypervisors in racks 1-4 (openstack filters the hypervisors using the aggregates for each segment that neutron
creates).
However, during instance build the pci device request sent to nova-compute always contains the physnet of the same segment.
Debugging the builds, we ended up here:
https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py... <https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py#L2226>,
with:
# TODO(vladikr): Additional work will be required to handle the
# case of multiple vlan segments associated with different
# physical networks.
Which originates from this commit:
https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc958... <https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc9589c291a>
This suggests that despite the documentation describing using multiple VLAN backed segments in this fashion, this has never worked?
That is correct. nova has never supported the multiple phsent exteion that was added to neutron in general
https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definit... <https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/multiprovidernet.py>
you can have multiple network segments on the same phsenet but when routed provider networks was first designed there was an intention to have a second away to associate hosts with segments that did not depend on phsynets however that was never implemented. there was meant to be a way to associate host with segments directly via api or config that did not use phsynets to do that mapping.
Or are we missing something? Has anyone successfully used routed provider networks?
the wya that rotehed provider networks are typiclaly used is that you do not have a singel network that spans phsnets you can a 1:1 mapping between phsnet and segment and create seperate networks for each physnet
sriov has extra complications beause nova normally does tno have any awareness fo phsynets at all but for sriov you have to declare a single phsynet
for them in nova pci devspec.
a phsynet is intended to be effectively an l2 brodcast domain which is more or less what a sgment is as well.
technically the requirements for a neutron phsyent is stricter in its requirement of l2 isolation between phsnets then the isolation between segments.
--
Regards,
Nathan Harper
Principal Engineer – Cloud Development
Platform Engineering
nathanh@graphcore.ai <mailto:nathanh@graphcore.ai
<mailto:nathanh@graphcore.ai>>
www.graphcore.ai <http://www.graphcore.ai> →
<http://www.graphcore.ai/ <http://www.graphcore.ai/>>
** We have updated our privacy policy, which contains important information about how we collect and process your personal data. To read the policy, please click here <http://www.graphcore.ai/privacy
<http://www.graphcore.ai/privacy>> **
This email and its attachments are intended solely for the addressed recipients and may contain confidential or legally privileged information. If you are not the intended recipient you must not copy, distribute or disseminate this email in any way; to do so may be unlawful.
Any personal data/special category personal data herein are processed in accordance with UK data protection legislation. All associated feasible security measures are in place. Further details are available from the Privacy Notice on the website and/or from the Company.
Graphcore Limited (registered in England and Wales with registration number 10185006) is registered at, 1 Maple Road, Bramhall, Stockport, Cheshire, UK, SK7 2DH. This message was scanned for viruses upon transmission. However Graphcore accepts no liability for any such transmission.