[neutron] bonding sriov nic inside VMs

Sean Mooney smooney at redhat.com
Fri Mar 10 17:37:27 UTC 2023


On Fri, 2023-03-10 at 11:54 -0500, Satish Patel wrote:
> Hi Sean,
> 
> I have a few questions and they are in-line. This is the reference doc i am
> trying to achieve in my private cloud -
> https://www.redpill-linpro.com/techblog/2021/01/30/bonding-sriov-nics-with-openstack.html
^ is only safe in a multi tenant envionment if 
https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#ml2.tenant_network_types does not container vlan or flat.

it is technially breaking neutron rules for how to use phsyents.

in private cloud where tenatn isolation is not required operators have abused this for years for things like selecting numa nodes
and many other usecase which are unsafe in a public cloud.

> 
> On Fri, Mar 10, 2023 at 9:02 AM Sean Mooney <smooney at redhat.com> wrote:
> 
> > On Fri, 2023-03-10 at 08:30 -0500, Satish Patel wrote:
> > > Thanks Sean,
> > > 
> > > I don't have NIC which supports hardware offloading or any kind of
> > feature.
> > > I am using intel nic 82599 just for SRIOV and looking for bonding
> > > support which is only possible inside VM. As you know we already run a
> > > large SRIOV environment with openstack but my biggest issue is to upgrade
> > > switches without downtime. I want to be more resilient to not worry
> > > about that.
> > > 
> > > Do you still think it's dangerous or not a good idea to bond sriov nic
> > > inside VM?  what could go wrong here just trying to understand before i
> > go
> > > crazy :)
> > lacp bond mode generaly dont work fully but you should be abel to get
> > basic failover bondign working
> > and perhaps tcp loadbalcing provide it does not require switch coperator
> > to work form inside the guest.
> > 
> 
> What do you mean by not working fully? Are you talking about active-active
> vs active-standby?
some lacp modes require configuration on the swtich others do not
you can only really do that form the pf as at the switch level you can bring down
the port fo ronly some vlans in a failover case.

https://docs.rackspace.com/blog/lacp-bonding-and-linux-configuration/

i belive mode 0, 1, 2, 5 and 6 can work withour sepcial switgh config.

3 and 4 i think reuqired switch cooperation

IEEE 802.3ad (mode 4) in particalar i think neeed coperation with the switch.
"""The link is set up dynamically between two LACP-supporting peers."""
 https://en.wikipedia.org/wiki/Link_aggregation

that peerign session can only really run on the PFs

balance-tlb (5) and balance-alb(6) shoudl work fine for teh VFs in the guest however.

> 
> 
> > 
> > just keep in mind that by defintion if you decalre a network as on a
> > seperate phsynet to another
> > then you as the operator are asserting that there is no l2 connectivity
> > between those networks.
> > 
> > 
> This is interesting why not both physnet have the same L2 segment? Are you
> worried STP about the loop? But that is how LACP works both physical
> interfaces on the same segments.
if they are on the same l2 segment then there is no multi tancy when using vlan or flat netowrks.
more on this below.
> 
> 
> 
> > as vlan 100 on physnet_1 is intended ot be a sperate vlan form vlan 100 on
> > phsynet_2
> > 
> 
> I did a test in the lab with physnet_1 and physnet_2 both on the same VLAN
> ID in the same L2 domain and all works.

if you create 2 neutron networks 

physnet_1_vlan_100 and physnet_2_vlan_100

and map phsynet_1 to eth1 and phsnet_2 to eth2
and plug the both into the same TOR with vlan 100 trunked to both

then boot one vm on physnet_1_vlan_100 and a second on physnet_2_vlan_100

then a few things will hapen.

the vms will boot fine and both will get ips.
second there will be no isolation between the two networks
so if you use the same subnet on both then they will be able to direcly ping each other.

its unsafe to have teant cretable vlan networks in this if you have overlaping vlan ranges between physnet_1 and physnet_2
as there will be no tenant isolation enforeced at teh network level.

form a neutron point of view physnet_1_vlan_100 and physnet_2_vlan_100 are two entrily differnt netowrks and
its the oeprators responsiblity to ensure there network fabric ensure the same vlan on two phsnets cant comunicate.


> 
> 
> > 
> > if you break that and use phsynets to select PFs you are also breaking
> > neutron multi teancy model
> > meaning it is not safy to aloow end uers to create vlan networks and
> > instead you can only use provider created
> > vlan networks.
> > 
> 
> This is a private cloud and we don't have any multi-tenancy model. We have
> all VLAN base providers and my Datacenter core router is the gateway for
> all my vlans providers.
ack in which case you can live with the fact that there is no mulit taenancy
guarentees because the rules areound phsynets have been broken.

this is prrety common in telco cloud by the way so you would not be the first to do this.
> 
> 
> > 
> > so what you want to do is proably achiveable but you menthion phsyntes per
> > pf and that sounds like you are breaking
> > the physnets are seperate isolagged phsycial netowrks rule.
> > 
> 
> I can understand each physnet should be in a different tenant but in my
> case its vlan base provider and not sure what rules it's going to break.
each physnet does not need to be a diffent tenatn
the imporant thing is that neutron expects vlans on differnt physnets to be allcoateable seperatly.

so the same vlan on 2 phsynets logically represnet 2 differnt networks.
> 
> 
> > 
> > > 
> > > 
> > > 
> > > 
> > > On Fri, Mar 10, 2023 at 6:57 AM Sean Mooney <smooney at redhat.com> wrote:
> > > 
> > > > On Thu, 2023-03-09 at 16:43 -0500, Satish Patel wrote:
> > > > > Folks,
> > > > > 
> > > > > As you know, SR-IOV doesn't support bonding so the only solution is
> > to
> > > > > implement LACP bonding inside the VM.
> > > > > 
> > > > > I did some tests in the lab to create two physnet and map them with
> > two
> > > > > physical nic and create VF and attach them to VM. So far all good
> > but one
> > > > > problem I am seeing is each neutron port I create has an IP address
> > > > > associated and I can use only one IP on bond but that is just a
> > waste of
> > > > IP
> > > > > in the Public IP pool.
> > > > > 
> > > > > Are there any way to create sriov port but without IP address?
> > > > techinially we now support adressless port in neutron and nova.
> > > > so that shoudl be possible.
> > > > if you tried to do this with hardware offloaed ovs rather then the
> > > > standard sriov with the sriov
> > > > nic agent you likel will need to also use the allowed_adress_pairs
> > > > extension to ensure that ovs did not
> > > > drop the packets based on the ip adress. if you are using heriarcical
> > port
> > > > binding where you TOR is manged
> > > > by an ml2 driver you might also need the allowed_adress_pairs extension
> > > > with the sriov nic agent to make sure
> > > > the packets are not drop at the swtitch level.
> > > > 
> > > > as you likely arlready no we do not support VF bonding in openstack or
> > > > bonded ports in general in then neutron api.
> > > > there was an effort a few years ago to make a bond port extention that
> > > > mirror hwo trunk ports work
> > > > i.e. hanving 2 neutron subport and a bond port that  agreates them but
> > we
> > > > never got that far with
> > > > the design. that would have enabeld boning to be implemtned in diffent
> > ml2
> > > > driver  like ovs/sriov/ovn ectra with
> > > > a consitent/common api.
> > > > 
> > > > some people have used mellonox's VF lag functionalty howver that was
> > never
> > > > actully enable propelry in nova/neutron
> > > > so its not officlaly supported upstream but that functional allow you
> > to
> > > > attach only a singel VF to the guest form
> > > > bonded ports on a single card.
> > > > 
> > > > there is no supprot in nova/neutron for that offically as i said it
> > just
> > > > happens to work unitnetionally so i would not
> > > > advise that you use it in produciton unless your happy to work though
> > any
> > > > issues you find yourself.
> > > > 
> > > > 
> > 
> > 




More information about the openstack-discuss mailing list