[neutron][ops] OVN scale issues and reviving midonet plugin

Sam Morrison sorrison at gmail.com
Tue Apr 12 02:00:02 UTC 2022


Hi,

We recently tried to migrate our install from ML2 midonet -> OVN driver on our Victoria Openstack install with ~1000 hypervisors.

Victoria was the last release where midonet plugin was supported so was a good motivation to move. 
Unfortunately when we changed the neutron-server config over to use OVN in our production install everything went very bad.

We got lots of things like:

ovsdbapp.exceptions.TimeoutException: Commands [<neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.CheckRevisionNumberCommand object at 0x7fa3b84bd8b0>, <neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.SetLSwitchPortCommand object at 0x7fa37b04cf40>, <ovsdbapp.schema.ovn_northbound.commands.PgDelPortCommand object at 0x7fa375ae0fd0>] exceeded timeout 180 seconds, cause: TXN queue is ful

ovsdbapp.exceptions.TimeoutException: Commands [<neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.CheckRevisionNumberCommand object at 0x7f0ee5197dc0>, <neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.commands.SetLSwitchPortCommand object at 0x7f0ee50a92e0>, <ovsdbapp.schema.ovn_northbound.commands.PgAddPortCommand object at 0x7f0ee5181760>] exceeded timeout 180 seconds, cause: Result queue is empty

ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.backend.ovs_idl.command.DbDestroyCommand object at 0x7fcd99f9ea90>] exceeded timeout 180 seconds, cause: Result queue is empt


among other things, which forced us to roll back. Our next approach is to get everything up to yoga and try again, (with some better live testing before we make the switch somehow)

In the mean time we have revived the networking-midonet plugin in our own branch but just wanted to check to see if anyone else is in this situation and are or have looked into running midonet on wallaby and beyond?


Cheers,
Sam




More information about the openstack-discuss mailing list