[neutron] OVS inactivity probes
Hello all, Like others, we have seen an increase in the amount of messages like those below, followed up by disconnects: 2020-03-19T07:19:47.414Z|01614|rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting 2020-03-19T07:19:47.414Z|01615|rconn|ERR|br-ex<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting 2020-03-19T07:19:47.414Z|01616|rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting We have since increased the value of of_inactivity_probe (https://bugs.launchpad.net/neutron/+bug/1817022) and confirmed the Controller connection reflects this new value: --- # ovs-vsctl list Controller ... _uuid : b4814677-c6f3-4afc-9c9e-999d5a5ac78f connection_mode : out-of-band controller_burst_limit: [] controller_rate_limit: [] enable_async_messages: [] external_ids : {} inactivity_probe : 60000 is_connected : true local_gateway : [] local_ip : [] local_netmask : [] max_backoff : [] other_config : {} role : other status : {last_error="Connection refused", sec_since_connect="1420", sec_since_disconnect="1423", state=ACTIVE} target : "tcp:127.0.0.1:6633" --- However, we also see disconnects on the manager side, which the config option does not address: 2020-03-23T11:01:02.871Z|00443|reconnect|ERR|tcp:127.0.0.1:50098: no response to inactivity probe after 5 seconds, disconnecting This bug (https://bugs.launchpad.net/neutron/+bug/1627106) and related commit (https://opendev.org/openstack/neutron/commit/1698bee770b84a2663ba940a6ded5d4...) appear to leverage the ovs_vsctl_timeout value (since renamed to ovsdb_timeout), but the inactivity_probe for the Manager connection does not appear to be implemented. Honestly, I'm not sure if that code path is used. --- # ovs-vsctl list Manager _uuid : d61519ba-93fc-4fe5-b05c-b630778a44b0 connection_mode : [] external_ids : {} inactivity_probe : [] is_connected : true max_backoff : [] other_config : {} status : {bound_port="6640", n_connections="2", sec_since_connect="0", sec_since_disconnect="0"} target : "ptcp:6640:127.0.0.1" --- Running this by hand sets the inactivity_probe timeout on manager connection, but we'd prefer to use a built-in method, if possible: # ovs-vsctl set manager d61519ba-93fc-4fe5-b05c-b630778a44b0 inactivity_probe=30000 Any suggestions? Thanks, James
Hi, I checked it a bit deeper and it seems for me that method add_manager, which is in [1] is not used at all. In the past it was used by function "enable_connection_uri" from neutron.agent.ovsdb.native.helpers module but commit [2] switched it to use helper function from ovsdbapp. So I think that this is simply bug in Neutron which we need to fix. I opened bug for it [3]. [1] https://opendev.org/openstack/neutron/src/branch/master/neutron/agent/common... [2] https://review.opendev.org/#/c/453014/ [3] https://bugs.launchpad.net/neutron/+bug/1868686 On Mon, Mar 23, 2020 at 05:11:42PM +0000, James Denton wrote:
Hello all,
Like others, we have seen an increase in the amount of messages like those below, followed up by disconnects:
2020-03-19T07:19:47.414Z|01614|rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting 2020-03-19T07:19:47.414Z|01615|rconn|ERR|br-ex<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting 2020-03-19T07:19:47.414Z|01616|rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting
We have since increased the value of of_inactivity_probe (https://bugs.launchpad.net/neutron/+bug/1817022) and confirmed the Controller connection reflects this new value:
--- # ovs-vsctl list Controller ... _uuid : b4814677-c6f3-4afc-9c9e-999d5a5ac78f connection_mode : out-of-band controller_burst_limit: [] controller_rate_limit: [] enable_async_messages: [] external_ids : {} inactivity_probe : 60000 is_connected : true local_gateway : [] local_ip : [] local_netmask : [] max_backoff : [] other_config : {} role : other status : {last_error="Connection refused", sec_since_connect="1420", sec_since_disconnect="1423", state=ACTIVE} target : "tcp:127.0.0.1:6633" ---
However, we also see disconnects on the manager side, which the config option does not address:
2020-03-23T11:01:02.871Z|00443|reconnect|ERR|tcp:127.0.0.1:50098: no response to inactivity probe after 5 seconds, disconnecting
This bug (https://bugs.launchpad.net/neutron/+bug/1627106) and related commit (https://opendev.org/openstack/neutron/commit/1698bee770b84a2663ba940a6ded5d4...) appear to leverage the ovs_vsctl_timeout value (since renamed to ovsdb_timeout), but the inactivity_probe for the Manager connection does not appear to be implemented. Honestly, I'm not sure if that code path is used.
--- # ovs-vsctl list Manager _uuid : d61519ba-93fc-4fe5-b05c-b630778a44b0 connection_mode : [] external_ids : {} inactivity_probe : [] is_connected : true max_backoff : [] other_config : {} status : {bound_port="6640", n_connections="2", sec_since_connect="0", sec_since_disconnect="0"} target : "ptcp:6640:127.0.0.1" ---
Running this by hand sets the inactivity_probe timeout on manager connection, but we'd prefer to use a built-in method, if possible:
# ovs-vsctl set manager d61519ba-93fc-4fe5-b05c-b630778a44b0 inactivity_probe=30000
Any suggestions?
Thanks, James
-- Slawek Kaplonski Senior software engineer Red Hat
Thank you, Slawek. Appreciate the quick assist! James On 3/24/20, 4:58 AM, "Slawek Kaplonski" <skaplons@redhat.com> wrote: CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hi, I checked it a bit deeper and it seems for me that method add_manager, which is in [1] is not used at all. In the past it was used by function "enable_connection_uri" from neutron.agent.ovsdb.native.helpers module but commit [2] switched it to use helper function from ovsdbapp. So I think that this is simply bug in Neutron which we need to fix. I opened bug for it [3]. [1] https://opendev.org/openstack/neutron/src/branch/master/neutron/agent/common... [2] https://review.opendev.org/#/c/453014/ [3] https://bugs.launchpad.net/neutron/+bug/1868686 On Mon, Mar 23, 2020 at 05:11:42PM +0000, James Denton wrote: > Hello all, > > Like others, we have seen an increase in the amount of messages like those below, followed up by disconnects: > > 2020-03-19T07:19:47.414Z|01614|rconn|ERR|br-int<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting > 2020-03-19T07:19:47.414Z|01615|rconn|ERR|br-ex<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting > 2020-03-19T07:19:47.414Z|01616|rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no response to inactivity probe after 10 seconds, disconnecting > > We have since increased the value of of_inactivity_probe (https://bugs.launchpad.net/neutron/+bug/1817022) and confirmed the Controller connection reflects this new value: > > --- > # ovs-vsctl list Controller > ... > _uuid : b4814677-c6f3-4afc-9c9e-999d5a5ac78f > connection_mode : out-of-band > controller_burst_limit: [] > controller_rate_limit: [] > enable_async_messages: [] > external_ids : {} > inactivity_probe : 60000 > is_connected : true > local_gateway : [] > local_ip : [] > local_netmask : [] > max_backoff : [] > other_config : {} > role : other > status : {last_error="Connection refused", sec_since_connect="1420", sec_since_disconnect="1423", state=ACTIVE} > target : "tcp:127.0.0.1:6633" > --- > > However, we also see disconnects on the manager side, which the config option does not address: > > 2020-03-23T11:01:02.871Z|00443|reconnect|ERR|tcp:127.0.0.1:50098: no response to inactivity probe after 5 seconds, disconnecting > > This bug (https://bugs.launchpad.net/neutron/+bug/1627106) and related commit (https://opendev.org/openstack/neutron/commit/1698bee770b84a2663ba940a6ded5d4...) appear to leverage the ovs_vsctl_timeout value (since renamed to ovsdb_timeout), but the inactivity_probe for the Manager connection does not appear to be implemented. Honestly, I'm not sure if that code path is used. > > --- > # ovs-vsctl list Manager > _uuid : d61519ba-93fc-4fe5-b05c-b630778a44b0 > connection_mode : [] > external_ids : {} > inactivity_probe : [] > is_connected : true > max_backoff : [] > other_config : {} > status : {bound_port="6640", n_connections="2", sec_since_connect="0", sec_since_disconnect="0"} > target : "ptcp:6640:127.0.0.1" > --- > > Running this by hand sets the inactivity_probe timeout on manager connection, but we'd prefer to use a built-in method, if possible: > > # ovs-vsctl set manager d61519ba-93fc-4fe5-b05c-b630778a44b0 inactivity_probe=30000 > > Any suggestions? > > Thanks, > James > -- Slawek Kaplonski Senior software engineer Red Hat
participants (2)
-
James Denton
-
Slawek Kaplonski