Re: [ops][nova][neutron] Routed provider networking and physnet port scheduling

4 Sep 2025

      On 04/09/2025 08:08, Andrew Bonney wrote:
...
Yes, I'd hoped I'd caveated it enough to suggest it wouldn't be a 
supported mode of operation 🙂
For what it's worth we do have L2 isolation between the segments in 
this scenario, they're each in a separate VLAN and there is external 
L3 routing.
Hello,
VLANs are not enough to meet the actual Neutron requirements for physnets,
which require L1 isolation so that you can have isolation between flat
and VLAN network as well.
When I said L2 isolation before, I meant that VLAN 100 on physnet 1 must
not interact with VLAN 100 on physnet 2.

In the example below:

```

[pci]

device_spec = { "physical_network": "physnet_media_a1", "devname":

"enp129s0f0np0" }

device_spec = { "physical_network": "physnet_media_a2", "devname":

"enp129s0f0np0" }

device_spec = { "physical_network": "physnet_media_a3", "devname":

"enp129s0f0np0" }

```

A VLAN network on physnet_media_a1 would not be isolated from a VLAN network
on physnet_media_a2 or physnet_media_a3.
In Neutron, there are 2 related concepts: the network type and if that
type is a provider or tenant network.
What makes a Neutron network a provider network is purely who selects the
encapsulation ID, i.e., the VLAN ID.
If it's specified by an admin, then it's a provider network.
If it's selected automatically from a range of allowed values and 
allocated by
Neutron, then it's a tenant network.

Whether a network type can be used as a tenant network depends on
https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.te...
So, if you list VLAN in tenant_network_types and you properly list the
VLAN ranges in
|https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2_type_vlan.network_vlan_ranges|
then Neutron will allow normal end-users to request a VLAN network, and it
will allocate one.
This is only safe to do if you follow the isolation rules I noted above.
It would be unsafe to list VLAN or flat in tenant_network_types in your
configuration.

It is perfectly fine to list them in 
https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.ty...
as that is the list of all possible network types your installation of 
Neutron
will support and the set out of tenant_network_types are the extra network
types available to you as an admin to provision as provider networks.

in other words if a network type is listed in type_drivers but not in 
tenant_network_types that tyep is reserved for provider use.

So that's kind of the main point I was trying to call out: physnets need L1
isolation; segments just need L2 isolation, i.e., the network
encapsulation provided by Neutron networks is enough for segments.
It is not enough for physnets, which is why I said the requirements on
physnets are stricter.

If you only allow Geneve or VXLAN networks for your tenant networks
and reserve VLAN/flat for your provider networks, you can sort of cheat, but
that means you can't allow your end user to create networks that will be 
used
by SR-IOV ports in your cloud. You have to manually create the VLAN
network for them and make sure that VLAN is not used on any other physnet
connected to the same top-of-rack switch.
...
------------------------------------------------------------------------
*From:* Sean Mooney <smooney@redhat.com>
*Sent:* Wednesday, September 03, 2025 16:55
*To:* Andrew Bonney <Andrew.Bonney@bbc.co.uk>; 
openstack-discuss@lists.openstack.org 
<openstack-discuss@lists.openstack.org>; nathanh@graphcore.ai 
<nathanh@graphcore.ai>
*Cc:* bens@graphcore.ai <bens@graphcore.ai>
*Subject:* Re: [ops][nova][neutron] Routed provider networking and 
physnet port scheduling
External: Think before clicking
On 03/09/2025 16:40, Andrew Bonney wrote:
...
We have been using routed provider networks with SRIOV for a while,
but I believe this works a little 'by accident', so I'm not suggesting
this is a recommended path. I'll describe as best I can but it's a
while since I've worked through this so I may not be 100% on why it 
works.
Like your description, we have a physnet per rack, with a segment and
associated VLAN for each. A user can create a 'Direct' type port
associated with the segmented network, and with deferred IP allocation
so it doesn't tie the port to a rack.
If I'm remembering correctly the accidental 'hack' here is that ALL of
the hypervisors in all racks have a PCI device spec listing ALL of the
physical networks, and they all use the same interface name, as
follows. Only the Neutron SRIOV config differs dependent on the rack
the host is in.
[pci]
# PCI devices available to VMs
device_spec = { "physical_network": "physnet_media_a1", "devname":
"enp129s0f0np0" }
device_spec = { "physical_network": "physnet_media_a2", "devname":
"enp129s0f0np0" }
device_spec = { "physical_network": "physnet_media_a3", "devname":
"enp129s0f0np0" }
so ^ is not supported form a nova perspective.
you are not meant to be able to list the same device multiple time and
merge the phsynets to form effectively a list like that.
it would be entirely supprotred if each line referenced a different 
device.
this should actually be a startup config error in nova.
on the neutron side phsynets must provide l2 isolation or it will break
multi tenantcy
i.e. vlan 100 on physnet_media_a1 __MUST__ not ever allow communication
to vlan 100 on physnet_media_a2 without the two phsynets being
interconnect by an l3 router
so this is also invalid config on the neutron side as you do not meet
the requirements for defining separate phsynets
it will break how multi tenancy is designed to work on ther side.
This can work in a private cloud and it can work fi you do not allow
vlan/flat tenant network in neutron but you as the admin
take on the burden of makeing sure that you do not violate multi tenancy
in this case isntead of neutron doing it.
this is a variation of the hack that telco do for numa local networks
before that was actully possibel in nova.
i.e. they woudl have phsynet_1_numa_0 and phsynet_1_numa_1 and use that
to force pci devices to come form the relevent numa node
but in reality they would nto be sperate pnsyical network so they would
create a multi provider network using the same valan on both to
interconnect them.
this is well into unsupproted land but if you deeply understand the
secuirty implciations and that is accptbel for you private cloud you
could do htis.
if it break however for any reason its not an upstream bug as you are
deliberatly misconfiguring nova and neutorn to make this work.
...
From Nova's perspective it is always selecting the first physical
network in the list, but this means that when the scheduler picks a
hypervisor in a rack other than '1', it will still be happy to proceed.
------------------------------------------------------------------------
*From:* Sean Mooney <smooney@redhat.com>
*Sent:* Wednesday, September 03, 2025 16:08
*To:* openstack-discuss@lists.openstack.org
<openstack-discuss@lists.openstack.org>; nathanh@graphcore.ai
<nathanh@graphcore.ai>
*Cc:* bens@graphcore.ai <bens@graphcore.ai>
*Subject:* Re: [ops][nova][neutron] Routed provider networking and
physnet port scheduling
External: Think before clicking
just adding bens back incase they are not on the list.
i slectecte the wrogn reply type before
On 03/09/2025 16:07, Sean Mooney wrote:
...
On 03/09/2025 15:10, Nathan Harper wrote:
...
Hi all,
We have been looking at building some routed provider networks,
following this documentation:
https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html 
<https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html>
...
<https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html 
<https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html>>
...
...
...
In this scenario we have 4 racks, and have defined physnets for each
rack and assigned SRIOV interfaces for each. We then have created
a multisegment network, with a segment associated with each
physnet.    We get the expected resource provider in placement
containing only these hypervisors.
When scheduling instances onto this network, the allocation
candidates are any hypervisors in racks 1-4 (openstack filters the
hypervisors using the aggregates for each segment that neutron
creates).
...
However, during instance build the pci device request sent to
nova-compute always contains the physnet of the same segment.
Debugging the builds, we ended up here:
https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py... 
<https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py#L2226>
...
<https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py... 
<https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py#L2226>>,
...
...
...
with:
# TODO(vladikr): Additional work will be required to handle the
# case of multiple vlan segments associated with different
# physical networks.
Which originates from this commit:
https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc958... 
<https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc9589c291a>
...
<https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc958... 
<https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc9589c291a>>
...
...
...
This suggests that despite the documentation describing using
multiple VLAN backed segments in this fashion, this has never worked?
That is correct. nova has never supported the multiple phsent exteion
that was added to neutron in general
https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definit... 
<https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/multiprovidernet.py>
...
<https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definit... 
<https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/multiprovidernet.py>>
...
...
you can have multiple network segments on the same phsenet but when
routed provider networks was first designed there was
an intention to have a second away to associate hosts with segments
that did not depend on phsynets however that was never implemented.
there was meant to be a way to associate host with segments directly
via api or config that did not use phsynets to do that mapping.
...
    Or are we missing something?  Has anyone successfully used routed
provider networks?
the wya that rotehed provider networks are typiclaly used is that you
do not have a singel network that spans phsnets
you can a 1:1 mapping between phsnet and segment and create seperate
networks for each physnet
sriov has extra complications beause nova normally does tno have any
awareness fo phsynets at all but for sriov you have to declare a
single phsynet
for them in nova pci devspec.
a phsynet is intended to be effectively an l2 brodcast domain which is
more or less what a sgment is as well.
technically the requirements for a neutron phsyent is stricter in its
requirement of l2 isolation between phsnets then the isolation between
segments.
...
--
Regards,
Nathan Harper
Principal Engineer – Cloud Development
Platform Engineering
nathanh@graphcore.ai <mailto:nathanh@graphcore.ai
<mailto:nathanh@graphcore.ai <mailto:nathanh@graphcore.ai>>>
...
...
www.graphcore.ai
<http://www.graphcore.ai> <http://www.graphcore.ai 
<http://www.graphcore.ai>> →
<http://www.graphcore.ai/ <http://www.graphcore.ai/ 
<http://www.graphcore.ai/>>>
...
** We have updated our privacy policy, which contains important
information about how we collect and process your personal data. To
read the policy, please click here <http://www.graphcore.ai/privacy
<http://www.graphcore.ai/privacy <http://www.graphcore.ai/privacy>>> **
...
This email and its attachments are intended solely for the addressed
recipients and may contain confidential or legally privileged
information.
If you are not the intended recipient you must not copy, distribute
or disseminate this email in any way; to do so may be unlawful.
Any personal data/special category personal data herein are processed
in accordance with UK data protection legislation.
All associated feasible security measures are in place. Further
details are available from the Privacy Notice on the website and/or
from the Company.
Graphcore Limited (registered in England and Wales with registration
number 10185006) is registered at, 1 Maple Road, Bramhall, Stockport,
Cheshire, UK, SK7 2DH.
This message was scanned for viruses upon transmission. However
Graphcore accepts no liability for any such transmission.