[Openstack-operators] Liberty and OVS Agent restarts

Clayton O'Neill clayton at oneill.net
Wed Feb 10 13:55:45 UTC 2016


Summary: Liberty OVS agent restarts are better, but still need work.
See: https://bugs.launchpad.net/neutron/+bug/1514056

As many of you know, Liberty has a fix for OVS agent restarts such
that it doesn’t dump all flows when starting, resulting in a loss of
traffic.  Unfortunately, Liberty neutron still has issues with OVS
agent restarts.  The fix that went into Liberty prevents it from
dropping flows on the br-tun and br-int bridges and that helps
greatly, but the br-ex bridge still has it’s flows cleared on startup.

You may be thinking: Wait, br-ex only has like 3 flows on it, how can
that be a problem?  The issue appears to be that the br-ex flows are
cleared early and not setup again until late in the process.  This
means that routers on the node where OVS agent is lose network
connectivity for the majority of the restart time.

I did some testing with this yesterday, comparing a few scenarios with
100 FIPS, 100 instances and various scenarios for routers.  You can
find the the complete data here:
https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing

The summary looks like this:
100 routers, 100 networks, 100 floating ips, 100 instances, single node test:
Kilo average outage time: 47 seconds
Liberty average outage time: 37 seconds

1 router, 1 network, 100 floating ips, 100 instances, single node test:
Kilo average outage time: 46 seconds
Liberty average outage time: 13 seconds

1 router, 1 network, 100 floating its, 100 instances, router on a
separate node, all instances on a single node, OVS restart on compute
node:
Kilo average outage time: 25 seconds
Liberty average outage time: 0 to 1 seconds

I did my testing using 1 second pings using fping to all of the
floating IPs.  With the last test, it frequently lost no packets, and
as a result I was not really able to test the scenario other than to
qualify it as good.

This is a huge operational issue for us and I suspect for many of the
rest of you using OVS.  I’d encourage everyone that is using OVS to
register interest in having this fixed in the LP bug
(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
as marked as low priority.



More information about the OpenStack-operators mailing list