[Openstack] AMQP queue errors, slow commands and more.

Thomas Zilio zilio at efficit.com
Wed Apr 29 16:54:47 UTC 2015


Hello,

I'm having some problems with my OpenStack installation (Icehouse) made following the installation guide provided on the OpenStack website.
I'm using Neutron (Figure 1.2 <http://docs.openstack.org/icehouse/install-guide/install/yum/content/ch_overview.html#architecture_example-architectures>) with the following specificities :
 - The Network node and the Controller node are the same node
 - 2 network interfaces (Management and External use the same one)
 - The Controller/Network node has only 1 ethernet port so this port is "VLAN tagged" in order to be connected to both network

The rest of the installation is similar to the one done in the guide.

Everything seems to be working fine except :
 - the nova-compute service of my compute nodes will randomly "crash" with the following log message repeating itself over and over (the service appears active but I'm forced to restart it for the controller to be able to use it):
> nova-compute[10376]: Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method GreenSocket.__del__ of <eventlet.greenio.GreenSocket object at 0x7f02b47894d0>> ignored

 - nova commands can be really really slow (not always, appears to be random to me) :
>  nova --timing list
> +-------------------------------------------------------------------------------+----------------+
> | url                                                                                            |  seconds     |
> +-------------------------------------------------------------------------------+----------------+
> | POST http://controller:35357/v2.0/tokens                                                                     |  0.424832105637 |
> | GET http://controller:8774/v2/566965d25fae409fbe1fb589a1066cfb/servers/detail     | 25.0711970329  |
> | Total                                                                                                                               |  25.4960291386  |
> +-------------------------------------------------------------------------------+----------------+


> nova --timing show demo-instance1
> +-------------------------------------------------------------------------------------------------------------+----------------+
> | url                                                                                                         | seconds        |
> +-------------------------------------------------------------------------------------------------------------+----------------+
> | POST http://controller:35357/v2.0/tokens                                                                                                                             | 0.344960927963 |
> | GET http://controller:8774/v2/566965d25fae409fbe1fb589a1066cfb/servers                                                                      | 0.366063117981 |
> | GET http://controller:8774/v2/566965d25fae409fbe1fb589a1066cfb/servers/5d5f04d8-edfb-4689-86a1-982647fd4e67   | 1.37827086449  |
> | GET http://controller:8774/v2/566965d25fae409fbe1fb589a1066cfb/flavors/1                                                                    | 0.7764108181   |
> | GET http://controller:8774/v2/566965d25fae409fbe1fb589a1066cfb/images/0bfcb0a3-d631-48ba-b158-9379e67fbc9e   | 28.1774730682  |
> | Total                                                                                                                                                                                      | 31.0431787968  |
> +-------------------------------------------------------------------------------------------------------------+----------------+

These very same commands might take half a second some other times.


Parsing the logs I found this error in almost every service of every node repeating every now and then :
> 2015-04-29 16:17:26.707 28642 INFO oslo.messaging._drivers.impl_qpid [-] Connected to AMQP server on controller:5672
> 2015-04-29 17:34:23.957 28642 ERROR oslo.messaging._drivers.impl_qpid [-] Failed to consume message from queue: heartbeat timeout
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid Traceback (most recent call last):
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 546, in ensure
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid     return method(*args, **kwargs)
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 599, in _consume
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid     nxt_receiver = self.session.next_receiver(timeout=timeout)
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid   File "<string>", line 6, in next_receiver
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 689, in next_receiver
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid     if self._ecwait(lambda: self.incoming, timeout):
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid     result = self._ewait(lambda: self.closed or predicate(), timeout)
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 595, in _ewait
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid     result = self.connection._ewait(lambda: self.error or predicate(), timeout)
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 234, in _ewait
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid     self.check_error()
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 227, in check_error
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid     raise e
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid HeartbeatTimeout: heartbeat timeout
> 2015-04-29 17:34:23.957 28642 TRACE oslo.messaging._drivers.impl_qpid 
> 2015-04-29 17:34:23.978 28642 INFO oslo.messaging._drivers.impl_qpid [-] Connected to AMQP server on controller:5672
> 2015-04-29 17:34:47.068 28642 ERROR oslo.messaging._drivers.impl_qpid [-] Failed to publish message to topic 'conductor': heartbeat timeout
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid Traceback (most recent call last):
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 546, in ensure
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     return method(*args, **kwargs)
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 619, in _publisher_send
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     publisher = cls(self.conf, self.session, topic)
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 407, in __init__
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     super(TopicPublisher, self).__init__(conf, session, node_name)
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 337, in __init__
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     self.reconnect(session)
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/impl_qpid.py", line 341, in reconnect
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     self.sender = session.sender(self.address)
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "<string>", line 6, in sender
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 621, in sender
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     sender._ewait(lambda: sender.linked)
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 831, in _ewait
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     result = self.session._ewait(lambda: self.error or predicate(), timeout)
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 595, in _ewait
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     result = self.connection._ewait(lambda: self.error or predicate(), timeout)
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 234, in _ewait
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     self.check_error()
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid   File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 227, in check_error
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid     raise e
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid HeartbeatTimeout: heartbeat timeout
> 2015-04-29 17:34:47.068 28642 TRACE oslo.messaging._drivers.impl_qpid 
> 2015-04-29 17:34:47.141 28642 INFO oslo.messaging._drivers.impl_qpid [-] Connected to AMQP server on controller:5672

I don't know if these problems are (all) related but if anyone has any idea how I could solve them I would be really grateful.

Thanks and regards,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20150429/16985dff/attachment.html>


More information about the Openstack mailing list