Thanks for your input.
That¡¦s an interesting approach Andrew, one that we can at least take a look at in the short term.
Our desire is that users can schedule onto a network, without needing any cloud specific info.
> it would be entirely supprotred if each line referenced a different device.
Our configuration currently looks like this (we need to migrate to device_spec):
(host in R07-P15)
passthrough_whitelist = [{ "devname": "enP7p1s0f0", "physical_network": "R07-P15" },{ "devname": "enP7p1s0f1", "physical_network": R07-P15" },{ "devname": "enP1p1s0f0", "physical_network": "R07-P15" },{ "devname": "enP1p1s0f1", "physical_network": "R07-P15"
},{ "devname": "enp1s0f0", "physical_network": "physnet1" },{ "devname": "enp1s0f1", "physical_network": "physnet1" }]
(host in R07-P14)
passthrough_whitelist = [{ "devname": "enP7p1s0f0", "physical_network": "R07-P14" },{ "devname": "enP7p1s0f1", "physical_network": R07-P14" },{ "devname": "enP1p1s0f0", "physical_network": "R07-P14" },{ "devname":
"enP1p1s0f1", "physical_network": "R07-P14" },{ "devname": "enp1s0f0", "physical_network": "physnet1" },{ "devname": "enp1s0f1", "physical_network": "physnet1" }]
which from the comment would be ¡¥supported¡¦, but in our scenario does not work.
> > you can have multiple network segments on the same phsenet but when
> > routed provider networks was first designed there was
> > an intention to have a second away to associate hosts with segments
> > that did not depend on phsynets however that was never implemented.
> > there was meant to be a way to associate host with segments directly
> > via api or config that did not use phsynets to do that mapping.
We ARE using different physnets to separate segments, eg:
Segment1 ¡V physnet R07-P14 ¡V vlan 2000
segment2 ¡V physnet R07-P15 ¡V vlan 2001
> >
> >> Or are we missing something? Has anyone successfully used routed
> >> provider networks?
> >>
> > the wya that rotehed provider networks are typiclaly used is that you
> > do not have a singel network that spans phsnets
> > you can a 1:1 mapping between phsnet and segment and create seperate
> > networks for each physnet
This is what we were trying to avoid ¡V we don¡¦t want our users to have to know which network they should have to connect an instance to, especially as without query_placement_for_routed_network_aggregates
= true
it will have to bounce around hypervisors until it lands on one with the right physnet (unless other hints are given via flavors etc)
> >
> > sriov has extra complications beause nova normally does tno have any
> > awareness fo phsynets at all but for sriov you have to declare a
> > single phsynet
Do you mean a single physnet per device?
From: Sean Mooney <smooney@redhat.com>
Date: Wednesday, 3 September 2025 at 16:55
To: Andrew Bonney <Andrew.Bonney@bbc.co.uk>, openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org>, Nathan Harper <nathanh@graphcore.ai>
Cc: Ben Shingler <bens@graphcore.ai>
Subject: Re: [ops][nova][neutron] Routed provider networking and physnet port scheduling
On 03/09/2025 16:40, Andrew Bonney wrote:
> We have been using routed provider networks with SRIOV for a while,
> but I believe this works a little 'by accident', so I'm not suggesting
> this is a recommended path. I'll describe as best I can but it's a
> while since I've worked through this so I may not be 100% on why it works.
>
> Like your description, we have a physnet per rack, with a segment and
> associated VLAN for each. A user can create a 'Direct' type port
> associated with the segmented network, and with deferred IP allocation
> so it doesn't tie the port to a rack.
>
> If I'm remembering correctly the accidental 'hack' here is that ALL of
> the hypervisors in all racks have a PCI device spec listing ALL of the
> physical networks, and they all use the same interface name, as
> follows. Only the Neutron SRIOV config differs dependent on the rack
> the host is in.
>
> [pci]
> # PCI devices available to VMs
> device_spec = { "physical_network": "physnet_media_a1", "devname":
> "enp129s0f0np0" }
> device_spec = { "physical_network": "physnet_media_a2", "devname":
> "enp129s0f0np0" }
> device_spec = { "physical_network": "physnet_media_a3", "devname":
> "enp129s0f0np0" }
so ^ is not supported form a nova perspective.
you are not meant to be able to list the same device multiple time and
merge the phsynets to form effectively a list like that.
it would be entirely supprotred if each line referenced a different device.
this should actually be a startup config error in nova.
on the neutron side phsynets must provide l2 isolation or it will break
multi tenantcy
i.e. vlan 100 on physnet_media_a1 __MUST__ not ever allow communication
to vlan 100 on physnet_media_a2 without the two phsynets being
interconnect by an l3 router
so this is also invalid config on the neutron side as you do not meet
the requirements for defining separate phsynets
it will break how multi tenancy is designed to work on ther side.
This can work in a private cloud and it can work fi you do not allow
vlan/flat tenant network in neutron but you as the admin
take on the burden of makeing sure that you do not violate multi tenancy
in this case isntead of neutron doing it.
this is a variation of the hack that telco do for numa local networks
before that was actully possibel in nova.
i.e. they woudl have phsynet_1_numa_0 and phsynet_1_numa_1 and use that
to force pci devices to come form the relevent numa node
but in reality they would nto be sperate pnsyical network so they would
create a multi provider network using the same valan on both to
interconnect them.
this is well into unsupproted land but if you deeply understand the
secuirty implciations and that is accptbel for you private cloud you
could do htis.
if it break however for any reason its not an upstream bug as you are
deliberatly misconfiguring nova and neutorn to make this work.
>
> From Nova's perspective it is always selecting the first physical
> network in the list, but this means that when the scheduler picks a
> hypervisor in a rack other than '1', it will still be happy to proceed.
>
>
> ------------------------------------------------------------------------
> *From:* Sean Mooney <smooney@redhat.com>
> *Sent:* Wednesday, September 03, 2025 16:08
> *To:* openstack-discuss@lists.openstack.org
> <openstack-discuss@lists.openstack.org>; nathanh@graphcore.ai
> <nathanh@graphcore.ai>
> *Cc:* bens@graphcore.ai <bens@graphcore.ai>
> *Subject:* Re: [ops][nova][neutron] Routed provider networking and
> physnet port scheduling
>
> External: Think before clicking
>
> just adding bens back incase they are not on the list.
>
> i slectecte the wrogn reply type before
>
> On 03/09/2025 16:07, Sean Mooney wrote:
> >
> > On 03/09/2025 15:10, Nathan Harper wrote:
> >>
> >> Hi all,
> >>
> >> We have been looking at building some routed provider networks,
> >> following this documentation:
> >>
>
https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html
> <https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html>
> >>
> >> In this scenario we have 4 racks, and have defined physnets for each
> >> rack and assigned SRIOV interfaces for each. We then have created
> >> a multisegment network, with a segment associated with each
> >> physnet. We get the expected resource provider in placement
> >> containing only these hypervisors.
> >>
> >> When scheduling instances onto this network, the allocation
> >> candidates are any hypervisors in racks 1-4 (openstack filters the
> >> hypervisors using the aggregates for each segment that neutron
> creates).
> >> However, during instance build the pci device request sent to
> >> nova-compute always contains the physnet of the same segment.
> >>
> >> Debugging the builds, we ended up here:
> >>
>
https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py#L2226
> <https://opendev.org/openstack/nova/src/branch/master/nova/network/neutron.py#L2226>,
> >> with:
> >>
> >> # TODO(vladikr): Additional work will be required to handle the
> >>
> >> # case of multiple vlan segments associated with different
> >>
> >> # physical networks.
> >>
> >> Which originates from this commit:
> >>
>
https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc9589c291a
> <https://opendev.org/openstack/nova/commit/b9d9d96a407db5a2adde3aed81e61cc9589c291a>
> >>
> >> This suggests that despite the documentation describing using
> >> multiple VLAN backed segments in this fashion, this has never worked?
> >>
> > That is correct. nova has never supported the multiple phsent exteion
> > that was added to neutron in general
> >
> >
>
https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/multiprovidernet.py
> <https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/multiprovidernet.py>
> >
> >
> > you can have multiple network segments on the same phsenet but when
> > routed provider networks was first designed there was
> > an intention to have a second away to associate hosts with segments
> > that did not depend on phsynets however that was never implemented.
> > there was meant to be a way to associate host with segments directly
> > via api or config that did not use phsynets to do that mapping.
> >
> >> Or are we missing something? Has anyone successfully used routed
> >> provider networks?
> >>
> > the wya that rotehed provider networks are typiclaly used is that you
> > do not have a singel network that spans phsnets
> > you can a 1:1 mapping between phsnet and segment and create seperate
> > networks for each physnet
> >
> > sriov has extra complications beause nova normally does tno have any
> > awareness fo phsynets at all but for sriov you have to declare a
> > single phsynet
> >
> > for them in nova pci devspec.
> >
> > a phsynet is intended to be effectively an l2 brodcast domain which is
> > more or less what a sgment is as well.
> >
> > technically the requirements for a neutron phsyent is stricter in its
> > requirement of l2 isolation between phsnets then the isolation between
> > segments.
> >
> >
> >>
> >> --
> >>
> >> Regards,
> >>
> >> Nathan Harper
> >>
> >> Principal Engineer ¡V Cloud Development
> >>
> >> Platform Engineering
> >>
> >> nathanh@graphcore.ai <mailto:nathanh@graphcore.ai
> <mailto:nathanh@graphcore.ai>>
> >>
> >> www.graphcore.ai <http://www.graphcore.ai> ¡÷
> <http://www.graphcore.ai/ <http://www.graphcore.ai/>>
> >>
> >>
> >>
> >> ** We have updated our privacy policy, which contains important
> >> information about how we collect and process your personal data. To
> >> read the policy, please click here <http://www.graphcore.ai/privacy
> <http://www.graphcore.ai/privacy>> **
> >>
> >> This email and its attachments are intended solely for the addressed
> >> recipients and may contain confidential or legally privileged
> >> information.
> >> If you are not the intended recipient you must not copy, distribute
> >> or disseminate this email in any way; to do so may be unlawful.
> >>
> >> Any personal data/special category personal data herein are processed
> >> in accordance with UK data protection legislation.
> >> All associated feasible security measures are in place. Further
> >> details are available from the Privacy Notice on the website and/or
> >> from the Company.
> >>
> >> Graphcore Limited (registered in England and Wales with registration
> >> number 10185006) is registered at, 1 Maple Road, Bramhall, Stockport,
> >> Cheshire, UK, SK7 2DH.
> >> This message was scanned for viruses upon transmission. However
> >> Graphcore accepts no liability for any such transmission.
>