[victoria][neutron] OVN Gateway Chassis Issue
Hi, I have four compute nodes in my lab setup. Initially all the four compute nodes were acting as gateway chassis with priority 1, 2, 3 and 4. Then I have specifically marked two node as a gateway chassis with below command on compute nodes. ovs-vsctl set open . external-ids:ovn-cms-options="enable-chassis-as-gw" The command ovn-nbctl list gateway_chassis start showing two chassis. I have checked via tcpdump, the public traffic started flowing from both nodes. Look like its doing round robin to send packets. Then I tried to remove one chassis from gateway and used below command. ovs-vsctl remove open . external-ids ovn-cms-options=enable-chassis-as-gw The ovn-nbctl list gateway_chassis started showing one gateway chassis but I can see from tcpdump that public traffic still flows from both gateway chassis. Below is the current status of chassis. root@network:/etc/neutron# ovn-sbctl list chassis _uuid : 532bb9d0-6667-462c-9631-0cb5360bd4dc encaps : [358c4a59-0bca-459c-958c-524eb8c385ce] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:liveness_check_at"="2021-03-16T08:52:41.361302+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.364928+00:00", "neutron:ovn-metadata-id"="2ac66785-d0c7-43ee-8c78-5fd6ed6ccc73", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""} hostname : virtual-hv2 name : "fdfae005-7473-486a-b331-8a54c53c1279" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: [] _uuid : a99ab389-96a5-4a58-a301-34618868450a encaps : [6e7490ce-3c58-4a1c-999d-ff1638c66feb] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", neutron-metadata-proxy-networks="dc917847-f70f-4de0-9865-3e9594c65ef1", "neutron:liveness_check_at"="2021-03-16T08:52:41.368768+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.372045+00:00", "neutron:ovn-metadata-id"="3441fc3c-ca43-4360-8210-8c9ebe4fc13d", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""} hostname : kvm10-a1-khi01 name : "87504098-4474-40fc-9576-ac449c1c4448" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: [] _uuid : b9bdfe12-fe27-4580-baee-159f871c442b encaps : [52a8f523-9740-4333-a4a4-69bf5e27117c] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", neutron-metadata-proxy-networks="dc917847-f70f-4de0-9865-3e9594c65ef1", "neutron:liveness_check_at"="2021-03-16T08:52:41.326719+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.342214+00:00", "neutron:ovn-metadata-id"="2a751610-97a8-4688-a719-df3616f4f770", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""} hostname : kvm12-a1-khi01 name : "82630e57-668e-4f67-a3fb-a173f4da432a" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: [] _uuid : 669d1ae3-7a5d-4ec3-869d-ca6240f9ae2c encaps : [ac8022b3-1ea5-45c7-a7e8-74db7b627df4] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:liveness_check_at"="2021-03-16T08:52:41.347144+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.352021+00:00", "neutron:ovn-metadata-id"="2d5ce6fd-6a9f-4356-9406-6ca91601af43", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=enable-chassis-as-gw} hostname : virtual-hv1 name : "731e842a-3a69-4044-87e9-32b7517d4f07" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: [] Need help how can I permanently remove a gateway chassis that it should stop serving public traffic ? also is it something to do with priority ? - Ammad
Hi Ammad, On Tue, Mar 16, 2021 at 9:15 AM Ammad Syed <syedammad83@gmail.com> wrote:
Hi,
I have four compute nodes in my lab setup. Initially all the four compute nodes were acting as gateway chassis with priority 1, 2, 3 and 4.
Then I have specifically marked two node as a gateway chassis with below command on compute nodes.
ovs-vsctl set open . external-ids:ovn-cms-options="enable-chassis-as-gw"
The command ovn-nbctl list gateway_chassis start showing two chassis. I have checked via tcpdump, the public traffic started flowing from both nodes. Look like its doing round robin to send packets.
Then I tried to remove one chassis from gateway and used below command.
ovs-vsctl remove open . external-ids ovn-cms-options=enable-chassis-as-gw
The ovn-nbctl list gateway_chassis started showing one gateway chassis but I can see from tcpdump that public traffic still flows from both gateway chassis.
Below is the current status of chassis.
root@network:/etc/neutron# ovn-sbctl list chassis _uuid : 532bb9d0-6667-462c-9631-0cb5360bd4dc encaps : [358c4a59-0bca-459c-958c-524eb8c385ce] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:liveness_check_at"="2021-03-16T08:52:41.361302+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.364928+00:00", "neutron:ovn-metadata-id"="2ac66785-d0c7-43ee-8c78-5fd6ed6ccc73", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""} hostname : virtual-hv2 name : "fdfae005-7473-486a-b331-8a54c53c1279" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: []
_uuid : a99ab389-96a5-4a58-a301-34618868450a encaps : [6e7490ce-3c58-4a1c-999d-ff1638c66feb] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", neutron-metadata-proxy-networks="dc917847-f70f-4de0-9865-3e9594c65ef1", "neutron:liveness_check_at"="2021-03-16T08:52:41.368768+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.372045+00:00", "neutron:ovn-metadata-id"="3441fc3c-ca43-4360-8210-8c9ebe4fc13d", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""} hostname : kvm10-a1-khi01 name : "87504098-4474-40fc-9576-ac449c1c4448" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: []
_uuid : b9bdfe12-fe27-4580-baee-159f871c442b encaps : [52a8f523-9740-4333-a4a4-69bf5e27117c] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", neutron-metadata-proxy-networks="dc917847-f70f-4de0-9865-3e9594c65ef1", "neutron:liveness_check_at"="2021-03-16T08:52:41.326719+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.342214+00:00", "neutron:ovn-metadata-id"="2a751610-97a8-4688-a719-df3616f4f770", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""} hostname : kvm12-a1-khi01 name : "82630e57-668e-4f67-a3fb-a173f4da432a" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: []
_uuid : 669d1ae3-7a5d-4ec3-869d-ca6240f9ae2c encaps : [ac8022b3-1ea5-45c7-a7e8-74db7b627df4] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:liveness_check_at"="2021-03-16T08:52:41.347144+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.352021+00:00", "neutron:ovn-metadata-id"="2d5ce6fd-6a9f-4356-9406-6ca91601af43", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=enable-chassis-as-gw} hostname : virtual-hv1 name : "731e842a-3a69-4044-87e9-32b7517d4f07" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: []
Need help how can I permanently remove a gateway chassis that it should stop serving public traffic ? also is it something to do with priority ?
Is it possible that we still have router ports scheduled onto that chassis ? You list your routers, router_ports and which gateway chassis the port is scheduled on with the following commands: # List your routers $ ovn-nbctl lr-list # List the router ports in that router $ ovn-nbctl lrp-list <router id> # List which gateway chassis (if any) that router port is scheduled on. Here you will see the priority, the highest is where the port should be located $ ovn-nbctl lrp-get-gateway-chassis <router port id> If that's the case, I think the OVN driver is not automatically accounting for rescheduling these ports when a gateway chassis is removed/added. We need to discuss whether this is something we want to have automatically like this or not because it can cause data disruption. Alternatively we could have a "rescheduling" script that could be run by operators when they want to add/remove a gateway chassis so they can plan before moving the ports from one chassis to another (again potentially causing disruptions). Hope that helps, Lucas
- Ammad
Hi Lucas, I have checked, currently there is only one router. root@network:/etc/neutron# ovn-nbctl lr-list 9f6111a9-3231-4f60-8199-78780760fe34 (neutron-ff36ce12-78fc-4ac9-9ae9-5a18ec1002bd) with two ports. root@network:/etc/neutron# ovn-nbctl lrp-list 9f6111a9-3231-4f60-8199-78780760fe34 a33dc21f-dcd7-4714-8003-9e21bc283d03 (lrp-52409f01-b140-4729-90b4-409c7c9b3f4b) 76dd76a7-1c64-4686-aeac-44ae63677404 (lrp-b12c1aa0-0857-494c-92a0-ee54cc7e01cc) One port showing no output. root@network:/etc/neutron# ovn-nbctl lrp-get-gateway-chassis a33dc21f-dcd7-4714-8003-9e21bc283d03 root@network:/etc/neutron# Other port showing gateway chassis. root@network:/etc/neutron# ovn-nbctl lrp-get-gateway-chassis 76dd76a7-1c64-4686-aeac-44ae63677404 lrp-b12c1aa0-0857-494c-92a0-ee54cc7e01cc_731e842a-3a69-4044-87e9-32b7517d4f07 1 This is the current active gateway chassis. root@network:/etc/neutron# ovn-nbctl list gateway_chassis _uuid : 4e23ff9b-9588-46aa-9ed1-69fea503a729 chassis_name : "731e842a-3a69-4044-87e9-32b7517d4f07" external_ids : {} name : lrp-b12c1aa0-0857-494c-92a0-ee54cc7e01cc_731e842a-3a69-4044-87e9-32b7517d4f07 options : {} priority : 1 This chassis fdfae005-7473-486a-b331-8a54c53c1279 is the one that I have removed from gateway chassis and I don't see any port scheduled on it. I have tried to reboot the chassis but when the chassis comes back, the uplink port start showing traffic in tcpdump. - Ammad On Tue, Mar 16, 2021 at 3:21 PM Lucas Alvares Gomes <lucasagomes@gmail.com> wrote:
Hi Ammad,
On Tue, Mar 16, 2021 at 9:15 AM Ammad Syed <syedammad83@gmail.com> wrote:
Hi,
I have four compute nodes in my lab setup. Initially all the four
compute nodes were acting as gateway chassis with priority 1, 2, 3 and 4.
Then I have specifically marked two node as a gateway chassis with below
command on compute nodes.
ovs-vsctl set open . external-ids:ovn-cms-options="enable-chassis-as-gw"
The command ovn-nbctl list gateway_chassis start showing two chassis. I
have checked via tcpdump, the public traffic started flowing from both nodes. Look like its doing round robin to send packets.
Then I tried to remove one chassis from gateway and used below command.
ovs-vsctl remove open . external-ids ovn-cms-options=enable-chassis-as-gw
The ovn-nbctl list gateway_chassis started showing one gateway chassis
but I can see from tcpdump that public traffic still flows from both gateway chassis.
Below is the current status of chassis.
root@network:/etc/neutron# ovn-sbctl list chassis _uuid : 532bb9d0-6667-462c-9631-0cb5360bd4dc encaps : [358c4a59-0bca-459c-958c-524eb8c385ce] external_ids : {datapath-type=system,
iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:liveness_check_at"="2021-03-16T08:52:41.361302+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.364928+00:00", "neutron:ovn-metadata-id"="2ac66785-d0c7-43ee-8c78-5fd6ed6ccc73", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""}
hostname : virtual-hv2 name : "fdfae005-7473-486a-b331-8a54c53c1279" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: []
_uuid : a99ab389-96a5-4a58-a301-34618868450a encaps : [6e7490ce-3c58-4a1c-999d-ff1638c66feb] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", neutron-metadata-proxy-networks="dc917847-f70f-4de0-9865-3e9594c65ef1", "neutron:liveness_check_at"="2021-03-16T08:52:41.368768+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.372045+00:00", "neutron:ovn-metadata-id"="3441fc3c-ca43-4360-8210-8c9ebe4fc13d", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""} hostname : kvm10-a1-khi01 name : "87504098-4474-40fc-9576-ac449c1c4448" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: []
_uuid : b9bdfe12-fe27-4580-baee-159f871c442b encaps : [52a8f523-9740-4333-a4a4-69bf5e27117c] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", neutron-metadata-proxy-networks="dc917847-f70f-4de0-9865-3e9594c65ef1", "neutron:liveness_check_at"="2021-03-16T08:52:41.326719+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.342214+00:00", "neutron:ovn-metadata-id"="2a751610-97a8-4688-a719-df3616f4f770", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=""} hostname : kvm12-a1-khi01 name : "82630e57-668e-4f67-a3fb-a173f4da432a" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: []
_uuid : 669d1ae3-7a5d-4ec3-869d-ca6240f9ae2c encaps : [ac8022b3-1ea5-45c7-a7e8-74db7b627df4] external_ids : {datapath-type=system, iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:liveness_check_at"="2021-03-16T08:52:41.347144+00:00", "neutron:metadata_liveness_check_at"="2021-03-16T08:52:41.352021+00:00", "neutron:ovn-metadata-id"="2d5ce6fd-6a9f-4356-9406-6ca91601af43", "neutron:ovn-metadata-sb-cfg"="6157", ovn-bridge-mappings="ext-net1:br-ext", ovn-chassis-mac-mappings="", ovn-cms-options=enable-chassis-as-gw} hostname : virtual-hv1 name : "731e842a-3a69-4044-87e9-32b7517d4f07" nb_cfg : 6157 transport_zones : [] vtep_logical_switches: []
Need help how can I permanently remove a gateway chassis that it should stop serving public traffic ? also is it something to do with priority ?
Is it possible that we still have router ports scheduled onto that chassis ?
You list your routers, router_ports and which gateway chassis the port is scheduled on with the following commands:
# List your routers $ ovn-nbctl lr-list
# List the router ports in that router $ ovn-nbctl lrp-list <router id>
# List which gateway chassis (if any) that router port is scheduled on. Here you will see the priority, the highest is where the port should be located $ ovn-nbctl lrp-get-gateway-chassis <router port id>
If that's the case, I think the OVN driver is not automatically accounting for rescheduling these ports when a gateway chassis is removed/added. We need to discuss whether this is something we want to have automatically like this or not because it can cause data disruption.
Alternatively we could have a "rescheduling" script that could be run by operators when they want to add/remove a gateway chassis so they can plan before moving the ports from one chassis to another (again potentially causing disruptions).
Hope that helps, Lucas
- Ammad
-- Regards, Syed Ammad Ali
participants (2)
-
Ammad Syed
-
Lucas Alvares Gomes