Re: [neutron][ovn] Logical flow scaling (flow explosion in lr_in_arp_resolve)
Krzysztof Klimonda
kklimonda at syntaxhighlighted.com
Fri Sep 18 08:31:50 UTC 2020
So just for testing I've applied this patch to our neutron-server:
--8<--8<--8<--
diff --git a/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py b/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py
index 23a841d7a1..41200786f1 100644
--- a/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py
+++ b/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py
@@ -1141,11 +1141,15 @@ class OVNClient(object):
enabled = router.get('admin_state_up')
lrouter_name = utils.ovn_name(router['id'])
added_gw_port = None
+ options = {
+ "always_learn_from_arp_request": "false",
+ "dynamic_neigh_routers": "true"
+ }
with self._nb_idl.transaction(check_error=True) as txn:
txn.add(self._nb_idl.create_lrouter(lrouter_name,
external_ids=external_ids,
enabled=enabled,
- options={}))
+ options=options))
# TODO(lucasagomes): add_external_gateway is being only used
# by the ovn_db_sync.py script, remove it after the database
# synchronization work
--8<--8<--8<--
and also executed that for each logical router in OVN:
# ovn-nbctl set Logical_Router $router options=dynamic_neigh_routers=true,always_learn_from_arp_request=false
This had a huge impact on both a number of logical flows and a number of ovs flows on chassis nodes:
--8<--8<--8<--
# cat lflows-new.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | uniq -c |sort -n | tail -10
2170 ls_out_port_sec_l2
2172 lr_in_learn_neighbor
2666 lr_in_admission
2690 ls_in_port_sec_l2
3190 lr_in_ip_routing
4276 lr_in_lookup_neighbor
4873 lr_in_arp_resolve
5864 ls_in_arp_rsp
5873 ls_in_l2_lkup
14343 lr_in_ip_input
# ovn-sbctl --timeout=120 lflow-list > lflows-new.txt
--8<--8<--8<--
(and this is even more routers than before - 500 vs 400). I'll have to read what impact do those options have on ARP activity though.
--
Krzysztof Klimonda
kklimonda at syntaxhighlighted.com
On Thu, Sep 17, 2020, at 21:14, Krzysztof Klimonda wrote:
> Hi Tony,
>
> Indeed I forgot to mention that all routers are using the same external
> network (and subnet) for the external gateway.
>
> Creating separate external networks per router wouldn't really work for
> us, and I'm not even quite sure what the setup would look like in that
> case.
>
> --
> Krzysztof Klimonda
> kklimonda at syntaxhighlighted.com
>
> On Thu, Sep 17, 2020, at 20:31, Tony Liu wrote:
> > I am trying to reach 5000. The problem I hit is that northd is
> > stuck in translating from NB to SB when connect router to external
> > network.
> >
> > I assume all your 400 routers connect to the same subnet in that
> > external network. I am trying another approach where one subnet
> > is created for each router in external network. That may help to
> > reduce the ARP flow?
> >
> > Thanks!
> > Tony
> > > -----Original Message-----
> > > From: Krzysztof Klimonda <kklimonda at syntaxhighlighted.com>
> > > Sent: Thursday, September 17, 2020 8:57 AM
> > > To: openstack-discuss at lists.openstack.org
> > > Subject: [neutron][ovn] Logical flow scaling (flow explosion in
> > > lr_in_arp_resolve)
> > >
> > > Hi,
> > >
> > > We're running some tests of ussuri deployment with ovn ML2 driver and
> > > seeing some worrying numbers of logical flows generated for our test
> > > deployment.
> > >
> > > As a test, we create 400 routes, 400 private networks and connect each
> > > network to its own routers. We also connect each router to an external
> > > network. After doing that a dump of logical flows shows almost 800k
> > > logical flows, most of them in lr_in_arp_resolve table:
> > >
> > > --8<--8<--8<--
> > > # cat lflows.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1
> > > |sort | uniq -c |sort -n | tail -10
> > > 3264 lr_in_learn_neighbor
> > > 3386 ls_out_port_sec_l2
> > > 4112 lr_in_admission
> > > 4202 ls_in_port_sec_l2
> > > 4898 lr_in_lookup_neighbor
> > > 4900 lr_in_ip_routing
> > > 9144 ls_in_l2_lkup
> > > 9160 ls_in_arp_rsp
> > > 22136 lr_in_ip_input
> > > 671656 lr_in_arp_resolve
> > > #
> > > --8<--8<--8<--
> > >
> > > ovn: 20.06.2 + patch for SNAT IP ARP reply issue
> > > openvswitch: 2.13.0
> > > neutron: 16.1.0
> > >
> > > I've seen some discussion about similar issue at OVS mailing lists:
> > > https://www.mail-archive.com/ovs-discuss@openvswitch.org/msg07014.html -
> > > is this relevant to neutron, and not just kubernetes?
> > >
> > > --
> > > Krzysztof Klimonda
> > > kklimonda at syntaxhighlighted.com
> >
> >
>
>
More information about the openstack-discuss
mailing list