[neutron] Flow drop on agent restart with openvswitch firewall driver
Hi All, I'm looking for ideas as we need to upgrade our Neutron deployment and it looks like it would impact workloads a bit much for now to do so and i'm no master of the neutron code... We're running Neutron 14.0.2 with ml2 plugin and firewall_driver set as openvswitch. drop_flows_on_start is default False. Reading at some old bug reports my understanding was that a restart of the neutron-openvswitch-agent should not impact existing flows and be seamless, but this is not what I'm experiencing as I see some temporary drop(s) around when ovs-fctl del-flows/add-flows is called on br-int (either east-west traffic or north-south). I tried switching to iptables_hybrid driver instead and I don't see the issue in that case. e.g when a wget download is happening on an instance while the agent is restarting, I see the following: 2020-09-08 14:26:09 (12.2 MB/s) - Read error at byte 146971864/7416743936 (Success). Retrying I'm a bit lot so i'm wondering if that's expected/known behavior, if a workaround is possible.... Let me know if a bug report might be a better place to dig deeper or not or if you want additional information... or if I missed a closed bug. Thanks !
Hi, On Tue, Sep 08, 2020 at 02:46:29PM +0000, Alexis Deberg wrote:
Hi All,
I'm looking for ideas as we need to upgrade our Neutron deployment and it looks like it would impact workloads a bit much for now to do so and i'm no master of the neutron code...
We're running Neutron 14.0.2 with ml2 plugin and firewall_driver set as openvswitch. drop_flows_on_start is default False.
Reading at some old bug reports my understanding was that a restart of the neutron-openvswitch-agent should not impact existing flows and be seamless, but this is not what I'm experiencing as I see some temporary drop(s) around when ovs-fctl del-flows/add-flows is called on br-int (either east-west traffic or north-south). I tried switching to iptables_hybrid driver instead and I don't see the issue in that case.
e.g when a wget download is happening on an instance while the agent is restarting, I see the following: 2020-09-08 14:26:09 (12.2 MB/s) - Read error at byte 146971864/7416743936 (Success). Retrying
I'm a bit lot so i'm wondering if that's expected/known behavior, if a workaround is possible....
I don't think it is expected behaviour. All flows should be first installed with new cookie id and then old ones should be removed. And that shouldn't impact existing traffic.
Let me know if a bug report might be a better place to dig deeper or not or if you want additional information... or if I missed a closed bug.
Yes, please report bug on Neutron's launchpad. And, if that is possible, please also try to reproduce the issue on current master branch (maybe deployed from devstack simply).
Thanks !
-- Slawek Kaplonski Principal software engineer Red Hat
Sure, opened https://bugs.launchpad.net/neutron/+bug/1895038 with all the details I got at hand. As I said in the bug report, I'll try to reproduce with a up to date devstack asap. Thanks
I'll see if I can reproduce this as well. We are running OVS as well in a RH env. (it would be nice to know because we are also restarting the agent sometimes :pray:) On Wed, Sep 9, 2020 at 3:39 PM Alexis Deberg <alexis.deberg@ubisoft.com> wrote:
Sure, opened https://bugs.launchpad.net/neutron/+bug/1895038 with all the details I got at hand. As I said in the bug report, I'll try to reproduce with a up to date devstack asap.
Thanks
See my last comment in the opened bug, looks like upgrading to a more recent version brings some patches that fix the issue. Thanks everyone, and especially to slaweq and rodolfo-alonso-hernandez Cheers
participants (3)
-
Alexis Deberg
-
Laurent Dumont
-
Slawek Kaplonski