[Openstack-operators] Liberty and OVS Agent restarts

Bajin, Joseph jbajin at verisign.com
Wed Feb 10 21:42:23 UTC 2016


Clayton, 

This is really good information. 

I’m wondering how we can help support you and get the necessary dev support to get this resolved sooner than later. I totally agree with you that this should be backported to at least Liberty. 

Please let me know how I and other can help!

—Joe









On 2/10/16, 8:55 AM, "Clayton O'Neill" <clayton at oneill.net> wrote:

>Summary: Liberty OVS agent restarts are better, but still need work.
>See: https://bugs.launchpad.net/neutron/+bug/1514056
>
>As many of you know, Liberty has a fix for OVS agent restarts such
>that it doesn’t dump all flows when starting, resulting in a loss of
>traffic.  Unfortunately, Liberty neutron still has issues with OVS
>agent restarts.  The fix that went into Liberty prevents it from
>dropping flows on the br-tun and br-int bridges and that helps
>greatly, but the br-ex bridge still has it’s flows cleared on startup.
>
>You may be thinking: Wait, br-ex only has like 3 flows on it, how can
>that be a problem?  The issue appears to be that the br-ex flows are
>cleared early and not setup again until late in the process.  This
>means that routers on the node where OVS agent is lose network
>connectivity for the majority of the restart time.
>
>I did some testing with this yesterday, comparing a few scenarios with
>100 FIPS, 100 instances and various scenarios for routers.  You can
>find the the complete data here:
>https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
>
>The summary looks like this:
>100 routers, 100 networks, 100 floating ips, 100 instances, single node test:
>Kilo average outage time: 47 seconds
>Liberty average outage time: 37 seconds
>
>1 router, 1 network, 100 floating ips, 100 instances, single node test:
>Kilo average outage time: 46 seconds
>Liberty average outage time: 13 seconds
>
>1 router, 1 network, 100 floating its, 100 instances, router on a
>separate node, all instances on a single node, OVS restart on compute
>node:
>Kilo average outage time: 25 seconds
>Liberty average outage time: 0 to 1 seconds
>
>I did my testing using 1 second pings using fping to all of the
>floating IPs.  With the last test, it frequently lost no packets, and
>as a result I was not really able to test the scenario other than to
>qualify it as good.
>
>This is a huge operational issue for us and I suspect for many of the
>rest of you using OVS.  I’d encourage everyone that is using OVS to
>register interest in having this fixed in the LP bug
>(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
>as marked as low priority.
>
>_______________________________________________
>OpenStack-operators mailing list
>OpenStack-operators at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5296 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160210/213f0dc5/attachment.bin>


More information about the OpenStack-operators mailing list