[ovn-bgp-agent][neutron] - expose_tenant_networks bug

Luis Tomas Bolivar ltomasbo at redhat.com
Thu Sep 1 06:54:57 UTC 2022


On Wed, Aug 31, 2022 at 9:09 PM Satish Patel <satish.txt at gmail.com> wrote:

> Hi Luis,
>
> Here are the requested things which you asked for.
>
> ### Versions
>
> pyroute2        =       0.7.2
> openvswitch-switch  =  2.17.0-0ubuntu1~cloud0
> ovn    = 22.03.0-0ubuntu1~cloud0
> devstack master branch
>
> ### Rack-1-host-2
>
> vagrant at rack-1-host-2:~$ ip rule
> 0: from all lookup local
> 1000: from all lookup [l3mdev-table]
> 32000: from all to 10.0.0.1/26 lookup br-ex
> 32000: from all to 172.16.1.144 lookup br-ex
> 32000: from all to 172.16.1.148 lookup br-ex
> 32766: from all lookup main
> 32767: from all lookup default
>
>
> vagrant at rack-1-host-2:~$ ip route show table br-ex
> default dev br-ex scope link
> 10.0.0.0/26 via 172.16.1.144 dev br-ex
> 172.16.1.144 dev br-ex scope link
> 172.16.1.148 dev br-ex scope link
>
>
This above looks like it has worked, it exposed the network properly in
there, so it seems it is just missing the IP addition for some reason.



> ### Rack-2-host-1
>
> vagrant at rack-2-host-1:~$ ip rule
> 0: from all lookup local
> 1000: from all lookup [l3mdev-table]
> 32000: from all to 172.16.1.143 lookup br-ex
> 32766: from all lookup main
> 32767: from all lookup default
>
>
> vagrant at rack-2-host-1:~$ ip route show table br-ex
> default dev br-ex scope link
> 172.16.1.143 dev br-ex scope link
>
>
> #### I have quickly cloned the latest branch of ovn-bgp-agent and ran and
> found the following error. Assuming your patch is part of that master
> branch.
>

Yes, it is, it got merged yesterday.


>
> rack-1-host-2: https://paste.opendev.org/show/bWbhmbzbi8YHGZsbhUAb/
>

Umm, same error...  I'll need to check that locally to see if I'm able to
reproduce it.

Perhaps worth to rewrite this:
extra_routes.append(
ndb.routes[{'table': ovn_routing_tables[bridge],
'dst': dst,
'family': AF_INET}]

As this to better see where the problem is:
ovn_table = ovn_routing_tables[bridge]
found_route = ndb.routes[{'table': ovn_table, 'dst': dst, 'family':
AF_INET}]
extra_routes.append(round_route)


>
>
> Notes: This is bug or something else -
> https://opendev.org/x/ovn-bgp-agent/src/branch/master/ovn_bgp_agent/privileged/vtysh.py#L27
>
> I have to replace the above Line:27 code of vtysh to the following to fix
> the vtysh error.
>


>
> @ovn_bgp_agent.privileged.vtysh_cmd.entrypoint
>
> def run_vtysh_config(frr_config_file):
>
>     vtysh_command = "copy {} running-config".format(frr_config_file)
>
>     full_args = ['/usr/bin/vtysh', '--vty_socket',
> constants.FRR_SOCKET_PATH, 'c']
>
>     full_args.extend(vtysh_command.split(' '))
>

Umm, weird, what exception was being raised?

 BTW, feel free to send any patch with fixes to the project!



> On Wed, Aug 31, 2022 at 3:51 AM Luis Tomas Bolivar <ltomasbo at redhat.com>
> wrote:
>
>>
>>
>> On Wed, Aug 31, 2022 at 9:12 AM Luis Tomas Bolivar <ltomasbo at redhat.com>
>> wrote:
>>
>>> See below
>>>
>>>
>>> On Tue, Aug 30, 2022 at 10:14 PM Satish Patel <satish.txt at gmail.com>
>>> wrote:
>>>
>>>> Hi Luis,
>>>>
>>>> I have redeploy my lab and i have following components
>>>>
>>>> rack-1-host-1 - controller
>>>> rack-1-host-2 - compute1
>>>> rack-2-host-1 - compute2
>>>>
>>>>
>>>> # I am running ovn-bgp-agent on only two compute nodes compute1 and
>>>> compute2
>>>> [DEFAULT]
>>>> debug=False
>>>> expose_tenant_networks=True
>>>> driver=ovn_bgp_driver
>>>> reconcile_interval=120
>>>> ovsdb_connection=unix:/var/run/openvswitch/db.sock
>>>>
>>>> ### without any VM at present i can see only router gateway IP on
>>>> rack1-host-2
>>>>
>>>
>>> Yep, this is what is expected at this point.
>>>
>>>
>>>>
>>>> vagrant at rack-1-host-2:~$ ip a show ovn
>>>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.16.1.144/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>>
>>>> vagrant at rack-2-host-1:~$ ip a show ovn
>>>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>>
>>>> ### Lets create vm1 which is endup on rack1-host-2 but it didn't expose
>>>> vm1 ip (tenant ip) same with rack-2-host-1
>>>>
>>>> vagrant at rack-1-host-2:~$ ip a show ovn
>>>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.16.1.144/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>
>>> It should be exposed here, what about the output of "ip rule" and "ip
>>> route show table br-ex"?
>>>
>>>
>>>>
>>>> vagrant at rack-2-host-1:~$ ip a show ovn
>>>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>>
>>>> ### Lets attach a floating ip to vm1 and see. now i can see 10.0.0.17
>>>> vm1 ip got expose on rack-1-host-2 same time nothing on rack-2-host-1 ( ofc
>>>> because no vm running on it)
>>>>
>>>> vagrant at rack-1-host-2:~$ ip a show ovn
>>>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.16.1.144/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 10.0.0.17/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 172.16.1.148/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>
>>> There is also a resync action happening every 120 seconds... Perhaps for
>>> some reason the initial addition of 10.0.0.17 failed and then the sync
>>> discovered it and added it (and it matched with the time you added the FIP
>>> more or less).
>>>
>>> But events are managed one by one and those 2 are different, so adding
>>> the FIP is not adding the internal IP. It was probably a sync action.
>>>
>>>
>>>>
>>>> vagrant at rack-2-host-1:~$ ip a show ovn
>>>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>>
>>>> #### Lets spin up vm2 which should end up on other compute node which
>>>> is rack-2-host-1  ( no change yet.. vm2 ip wasn't exposed anywhere yet. )
>>>>
>>>> vagrant at rack-1-host-2:~$ ip a show ovn
>>>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.16.1.144/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 10.0.0.17/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 172.16.1.148/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>>
>>>> vagrant at rack-2-host-1:~$ ip a show ovn
>>>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>>
>>>> #### Lets again attach floating ip to vm2 ( so far nothing changed,
>>>> technically it should expose IP on rack-1-host-2 )
>>>>
>>>> vagrant at rack-1-host-2:~$ ip a show ovn
>>>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.16.1.144/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 10.0.0.17/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet 172.16.1.148/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>> The IP of the second VM should be exposed here ^, in rack-1-host-2,
>>>> while the FIP in the other compute (rack-2-host-1)
>>>>
>>>
>>>
>>>> vagrant at rack-2-host-1:~$ ip a show ovn
>>>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>>>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>>>     inet 172.16.1.143/32 scope global ovn
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>>
>>>> Here is the logs - https://paste.opendev.org/show/bRThivJE4wvEN92DXJUo/
>>>>
>>>>
>>>
>>> What node these logs belong to? rack-1-host-2?
>>>
>>> And are you running with the latest code? Looks the problem is on the
>>> sync function when trying to ensure the routing table entry for br-ex. It
>>> prints this:
>>>
>>> 2022-08-30 20:12:54.541 8318 DEBUG ovn_bgp_agent.utils.linux_net [-] Found routing table for br-ex with: ['200', 'br-ex']
>>>
>>> So definitely ovn_routing_tables should be initialized with {'br-ex':
>>> 200}, so I don't really get where the KeyError comes from...
>>>
>>> Unless it is not accessing the dict, but accessing the ndb.routes...
>>> perhaps with the pyroute2 version you have, the family parameter is needed
>>> there. Let me send a patch that you can try with
>>>
>>
>> This is the patch https://review.opendev.org/c/x/ovn-bgp-agent/+/855062.
>> Give it a try and let me know if the error you are seeing in the logs goes
>> away with it
>>
>>
>>>
>>>> On Thu, Aug 25, 2022 at 6:25 AM Luis Tomas Bolivar <ltomasbo at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 25, 2022 at 11:31 AM Satish Patel <satish.txt at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Luis,
>>>>>>
>>>>>> Very interesting, you are saying it will only expose tenant ip on
>>>>>> gateway port node? Even we have DVR setup in cluster correct?
>>>>>>
>>>>>
>>>>> Almost. The path is the same as in a DVR setup without BGP (with the
>>>>> difference you can reach the internal IP). In a DVR setup, when the VM is
>>>>> in a tenant network, without a FIP, the traffic goes out through the cr-lrp
>>>>> (ovn router gateway port), i.e.,  the node hosting that port which is
>>>>> connecting the router where the subnet where the VM is to the provider
>>>>> network.
>>>>>
>>>>> Note this is a limitation due to how ovn is used in openstack neutron,
>>>>> where traffic needs to be injected into OVN overlay in the node holding the
>>>>> cr-lrp. We are investigating possible ways to overcome this limitation and
>>>>> expose the IP right away in the node hosting the VM.
>>>>>
>>>>>
>>>>>> Does gateway node going to expose ip for all other compute nodes?
>>>>>>
>>>>>
>>>>>> What if I have multiple gateway node?
>>>>>>
>>>>>
>>>>> No, each router connected to the provider network will have its own
>>>>> ovn router gateway port, and that can be allocated in any node which has
>>>>> "enable-chassis-as-gw". What is true is that all VMs in a tenant networks
>>>>> connected to the same router, will be exposed in the same location .
>>>>>
>>>>>
>>>>>> Did you configure that flag on all node or just gateway node?
>>>>>>
>>>>>
>>>>> I usually deploy with 3 controllers which are also my "networker"
>>>>> nodes, so those are the ones having the enable-chassis-as-gw flag.
>>>>>
>>>>>
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar <ltomasbo at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>> 
>>>>>> I tested it locally and it is exposing the IP properly in the node
>>>>>> where the ovn router gateway port is allocated. Could you double check if
>>>>>> that is the case in your setup too?
>>>>>>
>>>>>> On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar <
>>>>>> ltomasbo at redhat.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel <satish.txt at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Folks,
>>>>>>>>
>>>>>>>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found
>>>>>>>> everything working great except expose tenant network
>>>>>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/
>>>>>>>>
>>>>>>>>
>>>>>>>> Lab Summary:
>>>>>>>>
>>>>>>>> 1 controller node
>>>>>>>> 3 compute node
>>>>>>>>
>>>>>>>> ovn-bgp-agent running on all compute node because i am using
>>>>>>>> "enable_distributed_floating_ip=True"
>>>>>>>>
>>>>>>>
>>>>>>>> ovn-bgp-agent config:
>>>>>>>>
>>>>>>>> [DEFAULT]
>>>>>>>> debug=False
>>>>>>>> expose_tenant_networks=True
>>>>>>>> driver=ovn_bgp_driver
>>>>>>>> reconcile_interval=120
>>>>>>>> ovsdb_connection=unix:/var/run/openvswitch/db.sock
>>>>>>>>
>>>>>>>> I am not seeing my vm on tenant ip getting exposed but when i
>>>>>>>> attach FIP which gets exposed in loopback address. here is the full trace
>>>>>>>> of debug logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> It is not exposed in any node, right? Note when
>>>>>>> expose_tenant_network is enabled, the traffic to the tenant VM is exposed
>>>>>>> in the node holding the cr-lrp (ovn router gateway port) for the router
>>>>>>> connecting the tenant network to the provider one.
>>>>>>>
>>>>>>> The FIP will be exposed in the node where the VM is.
>>>>>>>
>>>>>>> On the other hand, the error you see there should not happen, so
>>>>>>> I'll investigate why that is and also double check if the
>>>>>>> expose_tenant_network flag is broken somehow.
>>>>>>>
>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> LUIS TOMÁS BOLÍVAR
>>>>>>> Principal Software Engineer
>>>>>>> Red Hat
>>>>>>> Madrid, Spain
>>>>>>> ltomasbo at redhat.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> LUIS TOMÁS BOLÍVAR
>>>>>> Principal Software Engineer
>>>>>> Red Hat
>>>>>> Madrid, Spain
>>>>>> ltomasbo at redhat.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> LUIS TOMÁS BOLÍVAR
>>>>> Principal Software Engineer
>>>>> Red Hat
>>>>> Madrid, Spain
>>>>> ltomasbo at redhat.com
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> LUIS TOMÁS BOLÍVAR
>>> Principal Software Engineer
>>> Red Hat
>>> Madrid, Spain
>>> ltomasbo at redhat.com
>>>
>>>
>>
>>
>> --
>> LUIS TOMÁS BOLÍVAR
>> Principal Software Engineer
>> Red Hat
>> Madrid, Spain
>> ltomasbo at redhat.com
>>
>>
>

-- 
LUIS TOMÁS BOLÍVAR
Principal Software Engineer
Red Hat
Madrid, Spain
ltomasbo at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220901/67658489/attachment-0001.htm>


More information about the openstack-discuss mailing list