See below


On Tue, Aug 30, 2022 at 10:14 PM Satish Patel <satish.txt@gmail.com> wrote:
Hi Luis,

I have redeploy my lab and i have following components 

rack-1-host-1 - controller
rack-1-host-2 - compute1
rack-2-host-1 - compute2 


# I am running ovn-bgp-agent on only two compute nodes compute1 and compute2 
[DEFAULT]
debug=False
expose_tenant_networks=True
driver=ovn_bgp_driver
reconcile_interval=120
ovsdb_connection=unix:/var/run/openvswitch/db.sock

### without any VM at present i can see only router gateway IP on rack1-host-2  

Yep, this is what is expected at this point.
 

vagrant@rack-1-host-2:~$ ip a show ovn
37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.144/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::8f7:6eff:fee0:1969/64 scope link
       valid_lft forever preferred_lft forever


vagrant@rack-2-host-1:~$ ip a show ovn
15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5461:6bff:fe29:ac29/64 scope link
       valid_lft forever preferred_lft forever


### Lets create vm1 which is endup on rack1-host-2 but it didn't expose vm1 ip (tenant ip) same with rack-2-host-1

vagrant@rack-1-host-2:~$ ip a show ovn
37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.144/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::8f7:6eff:fee0:1969/64 scope link
       valid_lft forever preferred_lft forever

It should be exposed here, what about the output of "ip rule" and "ip route show table br-ex"?
 

vagrant@rack-2-host-1:~$ ip a show ovn
15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5461:6bff:fe29:ac29/64 scope link
       valid_lft forever preferred_lft forever


### Lets attach a floating ip to vm1 and see. now i can see 10.0.0.17 vm1 ip got expose on rack-1-host-2 same time nothing on rack-2-host-1 ( ofc because no vm running on it)

vagrant@rack-1-host-2:~$ ip a show ovn
37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.144/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet 10.0.0.17/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet 172.16.1.148/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::8f7:6eff:fee0:1969/64 scope link
       valid_lft forever preferred_lft forever

There is also a resync action happening every 120 seconds... Perhaps for some reason the initial addition of 10.0.0.17 failed and then the sync discovered it and added it (and it matched with the time you added the FIP more or less). 

But events are managed one by one and those 2 are different, so adding the FIP is not adding the internal IP. It was probably a sync action.



vagrant@rack-2-host-1:~$ ip a show ovn
15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5461:6bff:fe29:ac29/64 scope link
       valid_lft forever preferred_lft forever


#### Lets spin up vm2 which should end up on other compute node which is rack-2-host-1  ( no change yet.. vm2 ip wasn't exposed anywhere yet. )

vagrant@rack-1-host-2:~$ ip a show ovn
37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.144/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet 10.0.0.17/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet 172.16.1.148/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::8f7:6eff:fee0:1969/64 scope link
       valid_lft forever preferred_lft forever


vagrant@rack-2-host-1:~$ ip a show ovn
15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5461:6bff:fe29:ac29/64 scope link
       valid_lft forever preferred_lft forever


#### Lets again attach floating ip to vm2 ( so far nothing changed, technically it should expose IP on rack-1-host-2 )

vagrant@rack-1-host-2:~$ ip a show ovn
37: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 0a:f7:6e:e0:19:69 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.144/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet 10.0.0.17/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet 172.16.1.148/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::8f7:6eff:fee0:1969/64 scope link
       valid_lft forever preferred_lft forever

The IP of the second VM should be exposed here ^, in rack-1-host-2, while the FIP in the other compute (rack-2-host-1)
 
vagrant@rack-2-host-1:~$ ip a show ovn
15: ovn: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovn-bgp-vrf state UNKNOWN group default qlen 1000
    link/ether 56:61:6b:29:ac:29 brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.143/32 scope global ovn
       valid_lft forever preferred_lft forever
    inet6 fe80::5461:6bff:fe29:ac29/64 scope link
       valid_lft forever preferred_lft forever


Here is the logs - https://paste.opendev.org/show/bRThivJE4wvEN92DXJUo/ 

What node these logs belong to? rack-1-host-2?

And are you running with the latest code? Looks the problem is on the sync function when trying to ensure the routing table entry for br-ex. It prints this:
2022-08-30 20:12:54.541 8318 DEBUG ovn_bgp_agent.utils.linux_net [-] Found routing table for br-ex with: ['200', 'br-ex']
So definitely ovn_routing_tables should be initialized with {'br-ex': 200}, so I don't really get where the KeyError comes from...

Unless it is not accessing the dict, but accessing the ndb.routes... perhaps with the pyroute2 version you have, the family parameter is needed there. Let me send a patch that you can try with


On Thu, Aug 25, 2022 at 6:25 AM Luis Tomas Bolivar <ltomasbo@redhat.com> wrote:


On Thu, Aug 25, 2022 at 11:31 AM Satish Patel <satish.txt@gmail.com> wrote:
Hi Luis,

Very interesting, you are saying it will only expose tenant ip on gateway port node? Even we have DVR setup in cluster correct? 

Almost. The path is the same as in a DVR setup without BGP (with the difference you can reach the internal IP). In a DVR setup, when the VM is in a tenant network, without a FIP, the traffic goes out through the cr-lrp (ovn router gateway port), i.e.,  the node hosting that port which is connecting the router where the subnet where the VM is to the provider network.

Note this is a limitation due to how ovn is used in openstack neutron, where traffic needs to be injected into OVN overlay in the node holding the cr-lrp. We are investigating possible ways to overcome this limitation and expose the IP right away in the node hosting the VM.


Does gateway node going to expose ip for all other compute nodes?

What if I have multiple gateway node? 

No, each router connected to the provider network will have its own ovn router gateway port, and that can be allocated in any node which has "enable-chassis-as-gw". What is true is that all VMs in a tenant networks connected to the same router, will be exposed in the same location .


Did you configure that flag on all node or just gateway node? 

I usually deploy with 3 controllers which are also my "networker" nodes, so those are the ones having the enable-chassis-as-gw flag.
 

Sent from my iPhone

On Aug 25, 2022, at 4:14 AM, Luis Tomas Bolivar <ltomasbo@redhat.com> wrote:


I tested it locally and it is exposing the IP properly in the node where the ovn router gateway port is allocated. Could you double check if that is the case in your setup too?

On Wed, Aug 24, 2022 at 8:58 AM Luis Tomas Bolivar <ltomasbo@redhat.com> wrote:


On Tue, Aug 23, 2022 at 6:04 PM Satish Patel <satish.txt@gmail.com> wrote:
Folks,

I am setting up ovn-bgp-agent lab in "BGP mode" and i found everything working great except expose tenant network https://ltomasbo.wordpress.com/2021/02/04/ovn-bgp-agent-testing-setup/ 

Lab Summary:

1 controller node 
3 compute node

ovn-bgp-agent running on all compute node because i am using "enable_distributed_floating_ip=True"

ovn-bgp-agent config:

[DEFAULT]
debug=False
expose_tenant_networks=True
driver=ovn_bgp_driver
reconcile_interval=120
ovsdb_connection=unix:/var/run/openvswitch/db.sock

I am not seeing my vm on tenant ip getting exposed but when i attach FIP which gets exposed in loopback address. here is the full trace of debug logs: https://paste.opendev.org/show/buHiJ90nFgC1JkQxZwVk/ 

It is not exposed in any node, right? Note when expose_tenant_network is enabled, the traffic to the tenant VM is exposed in the node holding the cr-lrp (ovn router gateway port) for the router connecting the tenant network to the provider one.

The FIP will be exposed in the node where the VM is.

On the other hand, the error you see there should not happen, so I'll investigate why that is and also double check if the expose_tenant_network flag is broken somehow.

Thanks!


--
LUIS TOMÁS BOLÍVAR
Principal Software Engineer
Red Hat
Madrid, Spain
ltomasbo@redhat.com  
 


--
LUIS TOMÁS BOLÍVAR
Principal Software Engineer
Red Hat
Madrid, Spain
ltomasbo@redhat.com  
 


--
LUIS TOMÁS BOLÍVAR
Principal Software Engineer
Red Hat
Madrid, Spain
ltomasbo@redhat.com  
 


--
LUIS TOMÁS BOLÍVAR
Principal Software Engineer
Red Hat
Madrid, Spain
ltomasbo@redhat.com