[Openstack-operators] Liberty and OVS Agent restarts

Clayton O'Neill clayton at oneill.net
Fri Feb 12 15:01:00 UTC 2016


I’ve tried it with both a blank value and the specific value.  It
doesn’t appear to make a difference.

In other news, Assaf Muller has upgraded the priority of the bug from
low to high.

On Fri, Feb 12, 2016 at 9:27 AM, Matt Kassawara <mkassawara at gmail.com> wrote:
> Out of curiosity, what do you have for the "external_network_bridge" option
> in the L3 agent config?
>
> On Wed, Feb 10, 2016 at 2:42 PM, Bajin, Joseph <jbajin at verisign.com> wrote:
>>
>> Clayton,
>>
>> This is really good information.
>>
>> I’m wondering how we can help support you and get the necessary dev
>> support to get this resolved sooner than later. I totally agree with you
>> that this should be backported to at least Liberty.
>>
>> Please let me know how I and other can help!
>>
>> —Joe
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2/10/16, 8:55 AM, "Clayton O'Neill" <clayton at oneill.net> wrote:
>>
>> >Summary: Liberty OVS agent restarts are better, but still need work.
>> >See: https://bugs.launchpad.net/neutron/+bug/1514056
>> >
>> >As many of you know, Liberty has a fix for OVS agent restarts such
>> >that it doesn’t dump all flows when starting, resulting in a loss of
>> >traffic.  Unfortunately, Liberty neutron still has issues with OVS
>> >agent restarts.  The fix that went into Liberty prevents it from
>> >dropping flows on the br-tun and br-int bridges and that helps
>> >greatly, but the br-ex bridge still has it’s flows cleared on startup.
>> >
>> >You may be thinking: Wait, br-ex only has like 3 flows on it, how can
>> >that be a problem?  The issue appears to be that the br-ex flows are
>> >cleared early and not setup again until late in the process.  This
>> >means that routers on the node where OVS agent is lose network
>> >connectivity for the majority of the restart time.
>> >
>> >I did some testing with this yesterday, comparing a few scenarios with
>> >100 FIPS, 100 instances and various scenarios for routers.  You can
>> >find the the complete data here:
>>
>> > >https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
>> >
>> >The summary looks like this:
>> >100 routers, 100 networks, 100 floating ips, 100 instances, single node
>> > test:
>> >Kilo average outage time: 47 seconds
>> >Liberty average outage time: 37 seconds
>> >
>> >1 router, 1 network, 100 floating ips, 100 instances, single node test:
>> >Kilo average outage time: 46 seconds
>> >Liberty average outage time: 13 seconds
>> >
>> >1 router, 1 network, 100 floating its, 100 instances, router on a
>> >separate node, all instances on a single node, OVS restart on compute
>> >node:
>> >Kilo average outage time: 25 seconds
>> >Liberty average outage time: 0 to 1 seconds
>> >
>> >I did my testing using 1 second pings using fping to all of the
>> >floating IPs.  With the last test, it frequently lost no packets, and
>> >as a result I was not really able to test the scenario other than to
>> >qualify it as good.
>> >
>> >This is a huge operational issue for us and I suspect for many of the
>> >rest of you using OVS.  I’d encourage everyone that is using OVS to
>> >register interest in having this fixed in the LP bug
>> >(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
>> >as marked as low priority.
>> >
>> >_______________________________________________
>> >OpenStack-operators mailing list
>> >OpenStack-operators at lists.openstack.org
>> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>



More information about the OpenStack-operators mailing list