[Openstack-operators] Best kernel options for openvswitch on network nodes on a large setup

Slawomir Kaplonski skaplons at redhat.com
Fri Sep 28 07:03:46 UTC 2018


Hi,

What version of Neutron and ovsdbapp You are using? IIRC there was such issue somewhere around Pike version, we saw it in functional tests quite often. But later with new ovsdbapp version I think that this problem was somehow solved.
Maybe try newer version of ovsdbapp and check if it will be better.

> Wiadomość napisana przez Jean-Philippe Méthot <jp.methot at planethoster.info> w dniu 27.09.2018, o godz. 23:05:
> 
> I got some answers from the openvswitch mailing list, essentially indicating the issue is in the connection between neutron-openvswitch-agent and ovs.
> 
> Here’s an output of ovs-vsctl list controller:
> 
> _uuid               : ff2dca74-9628-43c8-b89c-8d2f1242dd3f
> connection_mode     : out-of-band
> controller_burst_limit: []
> controller_rate_limit: []
> enable_async_messages: []
> external_ids        : {}
> inactivity_probe    : []
> is_connected        : false
> local_gateway       : []
> local_ip            : []
> local_netmask       : []
> max_backoff         : []
> other_config        : {}
> role                : other
> status              : {last_error="Connection timed out", sec_since_connect="22", sec_since_disconnect="1", state=BACKOFF}
> target              : "tcp:127.0.0.1:6633 »
> 
> So OVS is still working but the connection between neutron-openvswitch-agent and OVS gets interrupted somehow. It may also be linked to the HA vrrp switching host at random as the connection between both network nodes get severed. We also see SSH lagging momentarily. I’m starting to think that a limit of some kind in linux is reached, preventing connections from happening. However, I don’t think it’s max open file since the number of open files is nowhere close to what I’ve set it.
> 
> Ideas?
>   
> Jean-Philippe Méthot
> Openstack system administrator
> Administrateur système Openstack
> PlanetHoster inc.
> 
> 
> 
> 
>> Le 26 sept. 2018 à 15:16, Jean-Philippe Méthot <jp.methot at planethoster.info> a écrit :
>> 
>> Yes, I notice that every time that message appears, at least a few packets get dropped and some of our instances pop up in nagios, even though they are reachable 1 or 2 seconds after. It’s really causing us some issues as we can’t ensure proper network quality for our customers. Have you noticed the same?
>> 
>> By that point I think it may be best to contact openvswitch directly since it seems to be an issue with their component. I am about to do that and hope I don’t get sent back to the openstack mailing list. I would really like to know what this probe is and why it disconnects constantly under load.
>> 
>> Jean-Philippe Méthot
>> Openstack system administrator
>> Administrateur système Openstack
>> PlanetHoster inc.
>> 
>> 
>> 
>> 
>>> Le 26 sept. 2018 à 11:48, Simon Leinen <simon.leinen at switch.ch> a écrit :
>>> 
>>> Jean-Philippe Méthot writes:
>>>> This particular message makes it sound as if openvswitch is getting overloaded.
>>>> Sep 23 03:54:08 network1 ovsdb-server: ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no response to inactivity probe after 5.01 seconds, disconnecting
>>> 
>>> We get these as well :-(
>>> 
>>>> A lot of those keep appear, and openvswitch always reconnects almost
>>>> instantly though. I’ve done some research about that particular
>>>> message, but it didn’t give me anything I can use to fix it.
>>> 
>>> Would be interested in solutions as well.  But I'm sceptical whether
>>> kernel settings can help here, because the timeout/slowness seems to be
>>> located in the user-space/control-plane parts of Open vSwitch,
>>> i.e. OVSDB.
>>> -- 
>>> Simon.
>>> 
>>>> Jean-Philippe Méthot
>>>> Openstack system administrator
>>>> Administrateur système Openstack
>>>> PlanetHoster inc.
>>> 
>>>> Le 25 sept. 2018 à 19:37, Erik McCormick <emccormick at cirrusseven.com> a écrit :
>>> 
>>>> Ate you getting any particular log messages that lead you to conclude your issue lies with OVS? I've hit lots of kernel limits under those conditions before OVS itself ever
>>>> noticed. Anything in dmesg, journal or neutron logs of interest? 
>>> 
>>>> On Tue, Sep 25, 2018, 7:27 PM Jean-Philippe Méthot <jp.methot at planethoster.info> wrote:
>>> 
>>>> Hi,
>>> 
>>>> Are there some recommendations regarding kernel settings configuration for openvswitch? We’ve just been hit by what we believe may be an attack of some kind we
>>>> have never seen before and we’re wondering if there’s a way to optimize our network nodes kernel for openvswitch operation and thus minimize the impact of such an
>>>> attack, or whatever it was.
>>> 
>>>> Best regards,
>>> 
>>>> Jean-Philippe Méthot
>>>> Openstack system administrator
>>>> Administrateur système Openstack
>>>> PlanetHoster inc.
>>> 
>>>> _______________________________________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> 
>>>> _______________________________________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> 
>> 
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

— 
Slawek Kaplonski
Senior software engineer
Red Hat




More information about the OpenStack-operators mailing list