[ovn-bgp-agent][neutron] - expose_tenant_networks bug

Luis Tomas Bolivar ltomasbo at redhat.com
Wed Aug 31 07:51:22 UTC 2022


On Wed, Aug 31, 2022 at 9:12 AM Luis Tomas Bolivar <ltomasbo at redhat.com>
wrote:

> See below
>
>
> On Tue, Aug 30, 2022 at 10:14 PM Satish Patel <satish.txt at gmail.com>
> wrote:
>
>> Hi Luis,
>>
>> I have redeploy my lab and i have following components
>>
>> rack-1-host-1 - controller
>> rack-1-host-2 - compute1
>> rack-2-host-1 - compute2
>>
>>
>> # I am running ovn-bgp-agent on only two compute nodes compute1 and
>> compute2
>> [DEFAULT]
>> debug=False
>> expose_tenant_networks=True
>> driver=ovn_bgp_driver
>> reconcile_interval=120
>> ovsdb_connection=unix:/var/run/openvswitch/db.sock
>>
>> ### without any VM at present i can see only router gateway IP on
>> rack1-host-2
>>
>
> Yep, this is what is expected at this point.
>
>
>>
>> vagrant at rack-1-host-2:~$ ip a show ovn
>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>     inet 172.16.1.144/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>>
>> vagrant at rack-2-host-1:~$ ip a show ovn
>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>>
>> ### Lets create vm1 which is endup on rack1-host-2 but it didn't expose
>> vm1 ip (tenant ip) same with rack-2-host-1
>>
>> vagrant at rack-1-host-2:~$ ip a show ovn
>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>     inet 172.16.1.144/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>
> It should be exposed here, what about the output of "ip rule" and "ip
> route show table br-ex"?
>
>
>>
>> vagrant at rack-2-host-1:~$ ip a show ovn
>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>>
>> ### Lets attach a floating ip to vm1 and see. now i can see 10.0.0.17 vm1
>> ip got expose on rack-1-host-2 same time nothing on rack-2-host-1 ( ofc
>> because no vm running on it)
>>
>> vagrant at rack-1-host-2:~$ ip a show ovn
>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>     inet 172.16.1.144/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet 10.0.0.17/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet 172.16.1.148/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>
> There is also a resync action happening every 120 seconds... Perhaps for
> some reason the initial addition of 10.0.0.17 failed and then the sync
> discovered it and added it (and it matched with the time you added the FIP
> more or less).
>
> But events are managed one by one and those 2 are different, so adding the
> FIP is not adding the internal IP. It was probably a sync action.
>
>
>>
>> vagrant at rack-2-host-1:~$ ip a show ovn
>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>>
>> #### Lets spin up vm2 which should end up on other compute node which is
>> rack-2-host-1  ( no change yet.. vm2 ip wasn't exposed anywhere yet. )
>>
>> vagrant at rack-1-host-2:~$ ip a show ovn
>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>     inet 172.16.1.144/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet 10.0.0.17/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet 172.16.1.148/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>>
>> vagrant at rack-2-host-1:~$ ip a show ovn
>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>>
>> #### Lets again attach floating ip to vm2 ( so far nothing changed,
>> technically it should expose IP on rack-1-host-2 )
>>
>> vagrant at rack-1-host-2:~$ ip a show ovn
>> 37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
>>     inet 172.16.1.144/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet 10.0.0.17/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet 172.16.1.148/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::8f7:6eff:fee0:1969/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>> The IP of the second VM should be exposed here ^, in rack-1-host-2, while
>> the FIP in the other compute (rack-2-host-1)
>>
>
>
>> vagrant at rack-2-host-1:~$ ip a show ovn
>> 15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master
>> ovn-bgp-vrf state UNKNOWN group default qlen 1000
>>     link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
>>     inet 172.16.1.143/32 scope global ovn
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::5461:6bff:fe29:ac29/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>>
>> Here is the logs - https://paste.opendev.org/show/bRThivJE4wvEN92DXJUo/
>>
>
> What node these logs belong to? rack-1-host-2?
>
> And are you running with the latest code? Looks the problem is on the sync
> function when trying to ensure the routing table entry for br-ex. It prints
> this:
>
> 2022-08-30 20:12:54.541 8318 DEBUG ovn_bgp_agent.utils.linux_net [-] Found routing table for br-ex with: ['200', 'br-ex']
>
> So definitely ovn_routing_tables should be initialized with {'br-ex':
> 200}, so I don't really get where the KeyError comes from...
>
> Unless it is not accessing the dict, but accessing the ndb.routes...
> perhaps with the pyroute2 version you have, the family parameter is needed
> there. Let me send a patch that you can try with
>

This is the patch https://review.opendev.org/c/x/ovn-bgp-agent/+/855062.
Give it a try and let me know if the error you are seeing in the logs goes
away with it


>
>> On Thu, Aug 25, 2022 at 6:25 AM Luis Tomas Bolivar <ltomasbo at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Aug 25, 2022 at 11:31 AM Satish Patel <satish.txt at gmail.com>
>>> wrote:
>>>
>>>> Hi Luis,
>>>>
>>>> Very interesting, you are saying it will only expose tenant ip on
>>>> gateway port node? Even we have DVR setup in cluster correct?
>>>>
>>>
>>> Almost. The path is the same as in a DVR setup without BGP (with the
>>> difference you can reach the internal IP). In a DVR setup, when the VM is
>>> in a tenant network, without a FIP, the traffic goes out through the cr-lrp
>>> (ovn router gateway port), i.e.,  the node hosting that port which is
>>> connecting the router where the subnet where the VM is to the provider
>>> network.
>>>
>>> Note this is a limitation due to how ovn is used in openstack neutron,
>>> where traffic needs to be injected into OVN overlay in the node holding the
>>> cr-lrp. We are investigating possible ways to overcome this limitation and
>>> expose the IP right away in the node hosting the VM.
>>>
>>>
>>>> Does gateway node going to expose ip for all other compute nodes?
>>>>
>>>
>>>> What if I have multiple gateway node?
>>>>
>>>
>>> No, each router connected to the provider network will have its own ovn
>>> router gateway port, and that can be allocated in any node which has
>>> "enable-chassis-as-gw". What is true is that all VMs in a tenant networks
>>> connected to the same router, will be exposed in the same location .
>>>
>>>
>>>> Did you configure that flag on all node or just gateway node?
>>>>
>>>
>>> I usually deploy with 3 controllers which are also my "networker" nodes,
>>> so those are the ones having the enable-chassis-as-gw flag.
>>>
>>>
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar <ltomasbo at redhat.com>
>>>> wrote:
>>>>
>>>> 
>>>> I tested it locally and it is exposing the IP properly in the node
>>>> where the ovn router gateway port is allocated. Could you double check if
>>>> that is the case in your setup too?
>>>>
>>>> On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar <ltomasbo at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 23, 2022 at 6:04 PM Satish Patel <satish.txt at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Folks,
>>>>>>
>>>>>> I am setting up ovn-bgp-agent lab in "BGP mode" and i found
>>>>>> everything working great except expose tenant network
>>>>>> https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/
>>>>>>
>>>>>>
>>>>>> Lab Summary:
>>>>>>
>>>>>> 1 controller node
>>>>>> 3 compute node
>>>>>>
>>>>>> ovn-bgp-agent running on all compute node because i am using
>>>>>> "enable_distributed_floating_ip=True"
>>>>>>
>>>>>
>>>>>> ovn-bgp-agent config:
>>>>>>
>>>>>> [DEFAULT]
>>>>>> debug=False
>>>>>> expose_tenant_networks=True
>>>>>> driver=ovn_bgp_driver
>>>>>> reconcile_interval=120
>>>>>> ovsdb_connection=unix:/var/run/openvswitch/db.sock
>>>>>>
>>>>>> I am not seeing my vm on tenant ip getting exposed but when i attach
>>>>>> FIP which gets exposed in loopback address. here is the full trace of debug
>>>>>> logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/
>>>>>>
>>>>>
>>>>> It is not exposed in any node, right? Note when expose_tenant_network
>>>>> is enabled, the traffic to the tenant VM is exposed in the node holding the
>>>>> cr-lrp (ovn router gateway port) for the router connecting the tenant
>>>>> network to the provider one.
>>>>>
>>>>> The FIP will be exposed in the node where the VM is.
>>>>>
>>>>> On the other hand, the error you see there should not happen, so I'll
>>>>> investigate why that is and also double check if the expose_tenant_network
>>>>> flag is broken somehow.
>>>>>
>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> --
>>>>> LUIS TOMÁS BOLÍVAR
>>>>> Principal Software Engineer
>>>>> Red Hat
>>>>> Madrid, Spain
>>>>> ltomasbo at redhat.com
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> LUIS TOMÁS BOLÍVAR
>>>> Principal Software Engineer
>>>> Red Hat
>>>> Madrid, Spain
>>>> ltomasbo at redhat.com
>>>>
>>>>
>>>>
>>>
>>> --
>>> LUIS TOMÁS BOLÍVAR
>>> Principal Software Engineer
>>> Red Hat
>>> Madrid, Spain
>>> ltomasbo at redhat.com
>>>
>>>
>>
>
> --
> LUIS TOMÁS BOLÍVAR
> Principal Software Engineer
> Red Hat
> Madrid, Spain
> ltomasbo at redhat.com
>
>


-- 
LUIS TOMÁS BOLÍVAR
Principal Software Engineer
Red Hat
Madrid, Spain
ltomasbo at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220831/7b8799de/attachment-0001.htm>


More information about the openstack-discuss mailing list