[ovn-bgp-agent][neutron] - expose_tenant_networks bug

Luis Tomas Bolivar ltomasbo at redhat.com
Wed Aug 31 07:12:31 UTC 2022


See below


On Tue, Aug 30, 2022 at 10:14 PM Satish Patel <satish.txt at gmail.com> wrote:

> Hi Luis,
>
> I have redeploy my lab and i have following components
>
> rack-1-host-1 - controller
> rack-1-host-2 - compute1
> rack-2-host-1 - compute2
>
>
> # I am running ovn-bgp-agent on only two compute nodes compute1 and
> compute2
> [DEFAULT]
> debug=False
> expose_tenant_networks=True
> driver=ovn_bgp_driver
> reconcile_interval=120
> ovsdb_connection=unix:/var/run/openvswitch/db.sock
>
> ### without any VM at present i can see only router gateway IP on
> rack1-host-2
>

Yep, this is what is expected at this point.


>
> vagrant at rack-1-host-2:~$ ip a show ovn
> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>     inet 172.16.1.144/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>        valid_lft forever preferred_lft forever
>
>
> vagrant at rack-2-host-1:~$ ip a show ovn
> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>        valid_lft forever preferred_lft forever
>
>
> ### Lets create vm1 which is endup on rack1-host-2 but it didn't expose
> vm1 ip (tenant ip) same with rack-2-host-1
>
> vagrant at rack-1-host-2:~$ ip a show ovn
> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>     inet 172.16.1.144/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>        valid_lft forever preferred_lft forever
>

It should be exposed here, what about the output of "ip rule" and "ip route
show table br-ex"?


>
> vagrant at rack-2-host-1:~$ ip a show ovn
> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>        valid_lft forever preferred_lft forever
>
>
> ### Lets attach a floating ip to vm1 and see. now i can see 10.0.0.17 vm1
> ip got expose on rack-1-host-2 same time nothing on rack-2-host-1 ( ofc
> because no vm running on it)
>
> vagrant at rack-1-host-2:~$ ip a show ovn
> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>     inet 172.16.1.144/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet 10.0.0.17/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet 172.16.1.148/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>        valid_lft forever preferred_lft forever
>

There is also a resync action happening every 120 seconds... Perhaps for
some reason the initial addition of 10.0.0.17 failed and then the sync
discovered it and added it (and it matched with the time you added the FIP
more or less).

But events are managed one by one and those 2 are different, so adding the
FIP is not adding the internal IP. It was probably a sync action.


>
> vagrant at rack-2-host-1:~$ ip a show ovn
> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>        valid_lft forever preferred_lft forever
>
>
> #### Lets spin up vm2 which should end up on other compute node which is
> rack-2-host-1  ( no change yet.. vm2 ip wasn't exposed anywhere yet. )
>
> vagrant at rack-1-host-2:~$ ip a show ovn
> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>     inet 172.16.1.144/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet 10.0.0.17/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet 172.16.1.148/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>        valid_lft forever preferred_lft forever
>
>
> vagrant at rack-2-host-1:~$ ip a show ovn
> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>        valid_lft forever preferred_lft forever
>
>
> #### Lets again attach floating ip to vm2 ( so far nothing changed,
> technically it should expose IP on rack-1-host-2 )
>
> vagrant at rack-1-host-2:~$ ip a show ovn
> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>     inet 172.16.1.144/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet 10.0.0.17/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet 172.16.1.148/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>        valid_lft forever preferred_lft forever
>
> The IP of the second VM should be exposed here ^, in rack-1-host-2, while
> the FIP in the other compute (rack-2-host-1)
>


> vagrant at rack-2-host-1:~$ ip a show ovn
> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>     inet 172.16.1.143/32 scope global ovn
>        valid_lft forever preferred_lft forever
>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>        valid_lft forever preferred_lft forever
>
>
> Here is the logs - https://paste.opendev.org/show/bRThivJE4wvEN92DXJUo/
>

What node these logs belong to? rack-1-host-2?

And are you running with the latest code? Looks the problem is on the sync
function when trying to ensure the routing table entry for br-ex. It prints
this:

2022-08-30 20:12:54.541 8318 DEBUG ovn_bgp_agent.utils.linux_net [-]
Found routing table for br-ex with: ['200', 'br-ex']

So definitely ovn_routing_tables should be initialized with {'br-ex': 200},
so I don't really get where the KeyError comes from...

Unless it is not accessing the dict, but accessing the ndb.routes...
perhaps with the pyroute2 version you have, the family parameter is needed
there. Let me send a patch that you can try with


> On Thu, Aug 25, 2022 at 6:25 AM Luis Tomas Bolivar <ltomasbo at redhat.com>
> wrote:
>
>>
>>
>> On Thu, Aug 25, 2022 at 11:31 AM Satish Patel <satish.txt at gmail.com>
>> wrote:
>>
>>> Hi Luis,
>>>
>>> Very interesting, you are saying it will only expose tenant ip on
>>> gateway port node? Even we have DVR setup in cluster correct?
>>>
>>
>> Almost. The path is the same as in a DVR setup without BGP (with the
>> difference you can reach the internal IP). In a DVR setup, when the VM is
>> in a tenant network, without a FIP, the traffic goes out through the cr-lrp
>> (ovn router gateway port), i.e.,  the node hosting that port which is
>> connecting the router where the subnet where the VM is to the provider
>> network.
>>
>> Note this is a limitation due to how ovn is used in openstack neutron,
>> where traffic needs to be injected into OVN overlay in the node holding the
>> cr-lrp. We are investigating possible ways to overcome this limitation and
>> expose the IP right away in the node hosting the VM.
>>
>>
>>> Does gateway node going to expose ip for all other compute nodes?
>>>
>>
>>> What if I have multiple gateway node?
>>>
>>
>> No, each router connected to the provider network will have its own ovn
>> router gateway port, and that can be allocated in any node which has
>> "enable-chassis-as-gw". What is true is that all VMs in a tenant networks
>> connected to the same router, will be exposed in the same location .
>>
>>
>>> Did you configure that flag on all node or just gateway node?
>>>
>>
>> I usually deploy with 3 controllers which are also my "networker" nodes,
>> so those are the ones having the enable-chassis-as-gw flag.
>>
>>
>>>
>>> Sent from my iPhone
>>>
>>> On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar <ltomasbo at redhat.com>
>>> wrote:
>>>
>>> 
>>> I tested it locally and it is exposing the IP properly in the node where
>>> the ovn router gateway port is allocated. Could you double check if that is
>>> the case in your setup too?
>>>
>>> On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar <ltomasbo at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel <satish.txt at gmail.com>
>>>> wrote:
>>>>
>>>>> Folks,
>>>>>
>>>>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything
>>>>> working great except expose tenant network
>>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/
>>>>>
>>>>>
>>>>> Lab Summary:
>>>>>
>>>>> 1 controller node
>>>>> 3 compute node
>>>>>
>>>>> ovn-bgp-agent running on all compute node because i am using
>>>>> "enable_distributed_floating_ip=True"
>>>>>
>>>>
>>>>> ovn-bgp-agent config:
>>>>>
>>>>> [DEFAULT]
>>>>> debug=False
>>>>> expose_tenant_networks=True
>>>>> driver=ovn_bgp_driver
>>>>> reconcile_interval=120
>>>>> ovsdb_connection=unix:/var/run/openvswitch/db.sock
>>>>>
>>>>> I am not seeing my vm on tenant ip getting exposed but when i attach
>>>>> FIP which gets exposed in loopback address. here is the full trace of debug
>>>>> logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/
>>>>>
>>>>
>>>> It is not exposed in any node, right? Note when expose_tenant_network
>>>> is enabled, the traffic to the tenant VM is exposed in the node holding the
>>>> cr-lrp (ovn router gateway port) for the router connecting the tenant
>>>> network to the provider one.
>>>>
>>>> The FIP will be exposed in the node where the VM is.
>>>>
>>>> On the other hand, the error you see there should not happen, so I'll
>>>> investigate why that is and also double check if the expose_tenant_network
>>>> flag is broken somehow.
>>>>
>>>
>>>> Thanks!
>>>>
>>>>
>>>> --
>>>> LUIS TOMÁS BOLÍVAR
>>>> Principal Software Engineer
>>>> Red Hat
>>>> Madrid, Spain
>>>> ltomasbo at redhat.com
>>>>
>>>>
>>>
>>>
>>> --
>>> LUIS TOMÁS BOLÍVAR
>>> Principal Software Engineer
>>> Red Hat
>>> Madrid, Spain
>>> ltomasbo at redhat.com
>>>
>>>
>>>
>>
>> --
>> LUIS TOMÁS BOLÍVAR
>> Principal Software Engineer
>> Red Hat
>> Madrid, Spain
>> ltomasbo at redhat.com
>>
>>
>

-- 
LUIS TOMÁS BOLÍVAR
Principal Software Engineer
Red Hat
Madrid, Spain
ltomasbo at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220831/80d133e5/attachment-0001.htm>


More information about the openstack-discuss mailing list