On 04/09/2025 08:08, Andrew Bonney wrote:
Yes, I'd hoped I'd caveated it enough to suggest it wouldn't be a supported mode of operation 🙂
For what it's worth we do have L2 isolation between the segments in this scenario, they're each in a separate VLAN and there is external L3 routing.
Hello, VLANs are not enough to meet the actual Neutron requirements for physnets, which require L1 isolation so that you can have isolation between flat and VLAN network as well. When I said L2 isolation before, I meant that VLAN 100 on physnet 1 must not interact with VLAN 100 on physnet 2. In the example below: ``` [pci] device_spec = { "physical_network": "physnet_media_a1", "devname": "enp129s0f0np0" } device_spec = { "physical_network": "physnet_media_a2", "devname": "enp129s0f0np0" } device_spec = { "physical_network": "physnet_media_a3", "devname": "enp129s0f0np0" } ``` A VLAN network on physnet_media_a1 would not be isolated from a VLAN network on physnet_media_a2 or physnet_media_a3. In Neutron, there are 2 related concepts: the network type and if that type is a provider or tenant network. What makes a Neutron network a provider network is purely who selects the encapsulation ID, i.e., the VLAN ID. If it's specified by an admin, then it's a provider network. If it's selected automatically from a range of allowed values and allocated by Neutron, then it's a tenant network. Whether a network type can be used as a tenant network depends on https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.te... So, if you list VLAN in tenant_network_types and you properly list the VLAN ranges in |https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2_type_vlan.network_vlan_ranges| then Neutron will allow normal end-users to request a VLAN network, and it will allocate one. This is only safe to do if you follow the isolation rules I noted above. It would be unsafe to list VLAN or flat in tenant_network_types in your configuration. It is perfectly fine to list them in https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.ty... as that is the list of all possible network types your installation of Neutron will support and the set out of tenant_network_types are the extra network types available to you as an admin to provision as provider networks. in other words if a network type is listed in type_drivers but not in tenant_network_types that tyep is reserved for provider use. So that's kind of the main point I was trying to call out: physnets need L1 isolation; segments just need L2 isolation, i.e., the network encapsulation provided by Neutron networks is enough for segments. It is not enough for physnets, which is why I said the requirements on physnets are stricter. If you only allow Geneve or VXLAN networks for your tenant networks and reserve VLAN/flat for your provider networks, you can sort of cheat, but that means you can't allow your end user to create networks that will be used by SR-IOV ports in your cloud. You have to manually create the VLAN network for them and make sure that VLAN is not used on any other physnet connected to the same top-of-rack switch.
------------------------------------------------------------------------ *From:*Â Sean Mooney <smooney@redhat.com> *Sent:*Â Wednesday, September 03, 2025 16:55 *To:*Â Andrew Bonney <Andrew.Bonney@bbc.co.uk>; openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>; nathanh@graphcore.ai <nathanh@graphcore.ai> *Cc:*Â bens@graphcore.ai <bens@graphcore.ai> *Subject:*Â Re: [ops][nova][neutron] Routed provider networking and physnet port scheduling
External: Think before clicking
On 03/09/2025 16:40, Andrew Bonney wrote:
We have been using routed provider networks with SRIOV for a while, but I believe this works a little 'by accident', so I'm not suggesting this is a recommended path. I'll describe as best I can but it's a while since I've worked through this so I may not be 100% on why it works.
Like your description, we have a physnet per rack, with a segment and associated VLAN for each. A user can create a 'Direct' type port associated with the segmented network, and with deferred IP allocation so it doesn't tie the port to a rack.
If I'm remembering correctly the accidental 'hack' here is that ALL of the hypervisors in all racks have a PCI device spec listing ALL of the physical networks, and they all use the same interface name, as follows. Only the Neutron SRIOV config differs dependent on the rack the host is in.
[pci] # PCI devices available to VMs device_spec = { "physical_network": "physnet_media_a1", "devname": "enp129s0f0np0" } device_spec = { "physical_network": "physnet_media_a2", "devname": "enp129s0f0np0" } device_spec = { "physical_network": "physnet_media_a3", "devname": "enp129s0f0np0" }
so ^ is not supported form a nova perspective.
you are not meant to be able to list the same device multiple time and merge the phsynets to form effectively a list like that.
it would be entirely supprotred if each line referenced a different device.
this should actually be a startup config error in nova.
on the neutron side phsynets must provide l2 isolation or it will break multi tenantcy
i.e. vlan 100 on physnet_media_a1 __MUST__ not ever allow communication to vlan 100 on physnet_media_a2 without the two phsynets being interconnect by an l3 router
so this is also invalid config on the neutron side as you do not meet the requirements for defining separate phsynets
it will break how multi tenancy is designed to work on ther side.
This can work in a private cloud and it can work fi you do not allow vlan/flat tenant network in neutron but you as the admin take on the burden of makeing sure that you do not violate multi tenancy in this case isntead of neutron doing it.
this is a variation of the hack that telco do for numa local networks before that was actully possibel in nova.
i.e. they woudl have phsynet_1_numa_0 and phsynet_1_numa_1 and use that to force pci devices to come form the relevent numa node
but in reality they would nto be sperate pnsyical network so they would create a multi provider network using the same valan on both to interconnect them.
this is well into unsupproted land but if you deeply understand the secuirty implciations and that is accptbel for you private cloud you could do htis.
if it break however for any reason its not an upstream bug as you are deliberatly misconfiguring nova and neutorn to make this work.
From Nova's perspective it is always selecting the first physical network in the list, but this means that when the scheduler picks a hypervisor in a rack other than '1', it will still be happy to proceed.
------------------------------------------------------------------------ *From:* Sean Mooney <smooney@redhat.com> *Sent:* Wednesday, September 03, 2025 16:08 *To:* openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>; nathanh@graphcore.ai <nathanh@graphcore.ai> *Cc:* bens@graphcore.ai <bens@graphcore.ai> *Subject:* Re: [ops][nova][neutron] Routed provider networking and physnet port scheduling
External: Think before clicking
just adding bens back incase they are not on the list.
i slectecte the wrogn reply type before
On 03/09/2025 16:07, Sean Mooney wrote:
On 03/09/2025 15:10, Nathan Harper wrote:
Hi all,
We have been looking at building some routed provider networks, following this documentation:
https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html <https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html>
<https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html <https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html>>
In this scenario we have 4 racks, and have defined physnets for each rack and assigned SRIOV interfaces for each. We then have created a multisegment network, with a segment associated with each physnet.   We get the expected resource provider in placement containing only these hypervisors.
When scheduling instances onto this network, the allocation candidates are any hypervisors in racks 1-4 (openstack filters the hypervisors using the aggregates for each segment that neutron
creates).
However, during instance build the pci device request sent to nova-compute always contains the physnet of the same segment.
Debugging the builds, we ended up here:
https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py... <https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py#L2226>
<https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py... <https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py#L2226>>,
with:
# TODO(vladikr): Additional work will be required to handle the
# case of multiple vlan segments associated with different
# physical networks.
Which originates from this commit:
https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc958... <https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc9589c291a>
<https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc958... <https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc9589c291a>>
This suggests that despite the documentation describing using multiple VLAN backed segments in this fashion, this has never worked?
That is correct. nova has never supported the multiple phsent exteion that was added to neutron in general
https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definit... <https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/multiprovidernet.py>
<https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definit... <https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/multiprovidernet.py>>
you can have multiple network segments on the same phsenet but when routed provider networks was first designed there was an intention to have a second away to associate hosts with segments that did not depend on phsynets however that was never implemented. there was meant to be a way to associate host with segments directly via api or config that did not use phsynets to do that mapping.
   Or are we missing something? Has anyone successfully used routed provider networks?
the wya that rotehed provider networks are typiclaly used is that you do not have a singel network that spans phsnets you can a 1:1 mapping between phsnet and segment and create seperate networks for each physnet
sriov has extra complications beause nova normally does tno have any awareness fo phsynets at all but for sriov you have to declare a single phsynet
for them in nova pci devspec.
a phsynet is intended to be effectively an l2 brodcast domain which is more or less what a sgment is as well.
technically the requirements for a neutron phsyent is stricter in its requirement of l2 isolation between phsnets then the isolation between segments.
--
Regards,
Nathan Harper
Principal Engineer – Cloud Development
Platform Engineering
nathanh@graphcore.ai <mailto:nathanh@graphcore.ai
<mailto:nathanh@graphcore.ai <mailto:nathanh@graphcore.ai>>>
www.graphcore.ai
<http://www.graphcore.ai> <http://www.graphcore.ai <http://www.graphcore.ai>> → <http://www.graphcore.ai/ <http://www.graphcore.ai/ <http://www.graphcore.ai/>>>
** We have updated our privacy policy, which contains important information about how we collect and process your personal data. To read the policy, please click here <http://www.graphcore.ai/privacy
<http://www.graphcore.ai/privacy <http://www.graphcore.ai/privacy>>> **
This email and its attachments are intended solely for the addressed recipients and may contain confidential or legally privileged information. If you are not the intended recipient you must not copy, distribute or disseminate this email in any way; to do so may be unlawful.
Any personal data/special category personal data herein are processed in accordance with UK data protection legislation. All associated feasible security measures are in place. Further details are available from the Privacy Notice on the website and/or from the Company.
Graphcore Limited (registered in England and Wales with registration number 10185006) is registered at, 1 Maple Road, Bramhall, Stockport, Cheshire, UK, SK7 2DH. This message was scanned for viruses upon transmission. However Graphcore accepts no liability for any such transmission.