[Openstack-operators] Best kernel options for openvswitch on network nodes on a large setup
Jean-Philippe Méthot
jp.methot at planethoster.info
Thu Sep 27 21:05:50 UTC 2018
I got some answers from the openvswitch mailing list, essentially indicating the issue is in the connection between neutron-openvswitch-agent and ovs.
Here’s an output of ovs-vsctl list controller:
_uuid : ff2dca74-9628-43c8-b89c-8d2f1242dd3f
connection_mode : out-of-band
controller_burst_limit: []
controller_rate_limit: []
enable_async_messages: []
external_ids : {}
inactivity_probe : []
is_connected : false
local_gateway : []
local_ip : []
local_netmask : []
max_backoff : []
other_config : {}
role : other
status : {last_error="Connection timed out", sec_since_connect="22", sec_since_disconnect="1", state=BACKOFF}
target : "tcp:127.0.0.1:6633 »
So OVS is still working but the connection between neutron-openvswitch-agent and OVS gets interrupted somehow. It may also be linked to the HA vrrp switching host at random as the connection between both network nodes get severed. We also see SSH lagging momentarily. I’m starting to think that a limit of some kind in linux is reached, preventing connections from happening. However, I don’t think it’s max open file since the number of open files is nowhere close to what I’ve set it.
Ideas?
Jean-Philippe Méthot
Openstack system administrator
Administrateur système Openstack
PlanetHoster inc.
> Le 26 sept. 2018 à 15:16, Jean-Philippe Méthot <jp.methot at planethoster.info> a écrit :
>
> Yes, I notice that every time that message appears, at least a few packets get dropped and some of our instances pop up in nagios, even though they are reachable 1 or 2 seconds after. It’s really causing us some issues as we can’t ensure proper network quality for our customers. Have you noticed the same?
>
> By that point I think it may be best to contact openvswitch directly since it seems to be an issue with their component. I am about to do that and hope I don’t get sent back to the openstack mailing list. I would really like to know what this probe is and why it disconnects constantly under load.
>
> Jean-Philippe Méthot
> Openstack system administrator
> Administrateur système Openstack
> PlanetHoster inc.
>
>
>
>
>> Le 26 sept. 2018 à 11:48, Simon Leinen <simon.leinen at switch.ch <mailto:simon.leinen at switch.ch>> a écrit :
>>
>> Jean-Philippe Méthot writes:
>>> This particular message makes it sound as if openvswitch is getting overloaded.
>>> Sep 23 03:54:08 network1 ovsdb-server: ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no response to inactivity probe after 5.01 seconds, disconnecting
>>
>> We get these as well :-(
>>
>>> A lot of those keep appear, and openvswitch always reconnects almost
>>> instantly though. I’ve done some research about that particular
>>> message, but it didn’t give me anything I can use to fix it.
>>
>> Would be interested in solutions as well. But I'm sceptical whether
>> kernel settings can help here, because the timeout/slowness seems to be
>> located in the user-space/control-plane parts of Open vSwitch,
>> i.e. OVSDB.
>> --
>> Simon.
>>
>>> Jean-Philippe Méthot
>>> Openstack system administrator
>>> Administrateur système Openstack
>>> PlanetHoster inc.
>>
>>> Le 25 sept. 2018 à 19:37, Erik McCormick <emccormick at cirrusseven.com <mailto:emccormick at cirrusseven.com>> a écrit :
>>
>>> Ate you getting any particular log messages that lead you to conclude your issue lies with OVS? I've hit lots of kernel limits under those conditions before OVS itself ever
>>> noticed. Anything in dmesg, journal or neutron logs of interest?
>>
>>> On Tue, Sep 25, 2018, 7:27 PM Jean-Philippe Méthot <jp.methot at planethoster.info <mailto:jp.methot at planethoster.info>> wrote:
>>
>>> Hi,
>>
>>> Are there some recommendations regarding kernel settings configuration for openvswitch? We’ve just been hit by what we believe may be an attack of some kind we
>>> have never seen before and we’re wondering if there’s a way to optimize our network nodes kernel for openvswitch operation and thus minimize the impact of such an
>>> attack, or whatever it was.
>>
>>> Best regards,
>>
>>> Jean-Philippe Méthot
>>> Openstack system administrator
>>> Administrateur système Openstack
>>> PlanetHoster inc.
>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org <mailto:OpenStack-operators at lists.openstack.org>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org <mailto:OpenStack-operators at lists.openstack.org>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20180927/da57c33b/attachment.html>
More information about the OpenStack-operators
mailing list