Open Stack

Tue Mar 15 01:10:24 UTC 2016

Rahul – it seems your issue is similar to the one reported here, probably due to hostname resolution issue.
https://bugs.launchpad.net/charms/+source/quantum-gateway/+bug/1405588

Regards~hrushi

From: Rahul Sharma [mailto:rahulsharmaait at gmail.com]
Sent: Monday, March 14, 2016 3:32 PM
To: openstack <openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>>; OpenStack Development Mailing List <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>; openstack-operators at lists.openstack.org<mailto:openstack-operators at lists.openstack.org>
Subject: [Openstack-operators] [neutron] openvswitch-agent spins up too many /bin/ovsdb-client processes

Hi All,

We are trying to debug an issue with our production environment. We are seeing neutron-openvswitch-agent starts failing after some time (1-2 days). After debugging, we found that there are large number of entries for the ovsdb-client. On some nodes, it crosses more than 330 processes and then ovsdb process starts failing.
1.  root     30689     1  0 00:37 ?        00:00:00 /bin/ovsdb-client monitor Interface name,ofport --format=json
2.  root     30804     1  0 00:38 ?        00:00:00 /bin/ovsdb-client monitor Interface name,ofport --format=json
3.  root     30909     1  0 00:38 ?        00:00:00 /bin/ovsdb-client monitor Interface name,ofport --format=json

Pastebin link for the processes: http://pastebin.com/QGQC0Jrt
Pastebin link with openvswitch starting all of them: http://pastebin.com/repHMkHu

In logs, we start getting errors as:-
Mar 14 05:41:29 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
Mar 14 05:41:39 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
Mar 14 05:41:49 node2 ovs-vsctl: ovs|00001|fatal_signal|WARN|terminating with signal 14 (Alarm clock)
Mar 14 05:49:30 node2 ovs-vsctl: ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)
Mar 14 05:49:32 node2 ovs-vsctl: ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)
Mar 14 05:49:34 node2 ovs-vsctl: ovs|00001|vsctl|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)

Openvswitch version:-
[root at node2 ~(openstack_admin)]# ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.4.0
Compiled Sep  4 2015 09:49:34
DB Schema 7.12.1

We have to restart openvswitch service everytime and that clears up all the processes. We are trying to figure out why so many processes are getting started by neutron-agent? Also, we found that if we restart the host's networking, one new process for the /bin/ovsdb-client starts. We checked and found that we don't have any network fluctuations or any nic-flappings. Are there any pointers where we should be looking into? It occurs on both controller and compute nodes.

Rahul Sharma
MS in Computer Science, 2016
College of Computer and Information Science, Northeastern University
Mobile:  801-706-7860
Email: rahulsharmaait at gmail.com<mailto:rahulsharmaait at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160315/500e0c6d/attachment.html>

Open Stack

[openstack-dev] [Openstack-operators] [neutron] openvswitch-agent spins up too many /bin/ovsdb-client processes

OpenStack

Community

Documentation

Branding & Legal