[Openstack-operators] Can't boot image getting nova.api.openstack MessagingTimeout: Timed out waiting for a reply to message ID msg_id

evan evandeng2009 at qq.com
Wed Nov 25 05:48:05 UTC 2015


Kris,

You’re right this system work well before. I restarted ,this time, rabbitmq-server, nova-api, nova-conductor ,nova-console, nova-novncproxy and nova-scheduler together in a single line on Controller node. Now it’s back normal. so for lessen learned, if restarting singe service not resolve problem then think about restarting them all if related together.

Thank you Kris.

Evan.


> On Nov 25, 2015, at 12:12 AM, Kris G. Lindgren <klindgren at godaddy.com> wrote:
> 
> Evan,
> 
> I would recommend restarting anything that takes to rabbitmq.  This is a case of your rpc workers thinking that they are connected to rabbitmq but they really aren't.  I assume that this setup use to work at one time and now doesn’t?
> 
> ___________________________________________________________________
> Kris Lindgren
> Senior Linux Systems Engineer
> GoDaddy
> 
> From: Evan <evandeng2009 at qq.com <mailto:evandeng2009 at qq.com>>
> Date: Tuesday, November 24, 2015 at 5:02 AM
> To: OpenStack-operators <OpenStack-operators at lists.openstack.org <mailto:OpenStack-operators at lists.openstack.org>>
> Subject: [Openstack-operators] Can't boot image getting nova.api.openstack MessagingTimeout: Timed out waiting for a reply to message ID msg_id
> 
> hi,
> 
> I keep getting such error(as subject) when trying to boot an image for instance, no matter from webpage dashboard or cli. firstly it show the status of scheduling and after a while it's Error. I checked status of all service ,nothing gets failed. Below are the logs for nova-api and nova-conductor. 
> 
> It seems like the problem of rabbitmq, but it's running well without any explicit error shown. Once I tried to restart the rabbitmq then instance created successfully. Before this case happens I tried to set a 2nd external network on Network node but failed so I rolled it back(changed configs back and restart related services). Also checked those common CLI they're all good like nova image-list, nova service-list,neutron agent-list ,keystone service-list.
> 
> Openstack-Juno, with 1 controller, 1 network, 3 compute, 1 block and 1 object, servers have much more resources than needed.
> 
> 
> controller# grep MessagingTimeout /var/log/nova/nova-api.log
> ......
> 2015-11-24 15:58:00.404 18197 TRACE nova.api.openstack MessagingTimeout: Timed out waiting for a reply to message ID 039995083875445bacdd3bd8667cda51
> 2015-11-24 15:59:02.618 18204 TRACE nova.api.openstack MessagingTimeout: Timed out waiting for a reply to message ID 58c3ea4f3c2d43699632c1999df79e91
> 2015-11-24 18:58:28.030 18201 TRACE nova.api.openstack MessagingTimeout: Timed out waiting for a reply to message ID 6b43bf60e2cd494fab116515c5997ab5
> 
> controller# grep ERROR /var/log/nova/nova-api.log
> ......
> 2015-11-24 18:57:27.003 18201 ERROR oslo.messaging._drivers.impl_rabbit [req-2b810cc0-d809-4f2c-8a01-d1f7eca62b48 ] Failed to consume message from queue: [Errno 104] Connection reset by peer
> 2015-11-24 18:58:28.030 18201 ERROR nova.api.openstack [req-2b810cc0-d809-4f2c-8a01-d1f7eca62b48 None] Caught error: Timed out waiting for a reply to message ID 6b43bf60e2cd494fab116515c5997ab5
> 2015-11-24 19:07:26.591 18200 ERROR oslo.messaging._drivers.impl_rabbit [req-f532b095-5f37-432d-8cdc-06db8498b3c1 ] Failed to publish message to topic 'conductor': [Errno 104] Connection reset by peer
> 2015-11-24 19:22:36.310 18183 ERROR oslo.messaging._drivers.impl_rabbit [req-24c4ce2a-6516-401e-a6f8-8a89758cc6fa ] Failed to publish message to topic 'conductor': [Errno 104] Connection reset by peer
> 
> # tail -n 80 /var/log/nova/nova-conductor.log
> ......
> 2015-11-24 19:22:37.356 17770 TRACE oslo.messaging._drivers.impl_rabbit   File "/usr/lib/python2.7/site-packages/eventlet/greenio.py", line 342, in send
> 2015-11-24 19:22:37.356 17770 TRACE oslo.messaging._drivers.impl_rabbit     total_sent += fd.send(data[total_sent:], flags)
> 2015-11-24 19:22:37.356 17770 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 104] Connection reset by peer
> 2015-11-24 19:22:37.356 17770 TRACE oslo.messaging._drivers.impl_rabbit 
> 2015-11-24 19:22:37.357 17770 INFO oslo.messaging._drivers.impl_rabbit [req-24c4ce2a-6516-401e-a6f8-8a89758cc6fa ] Delaying reconnect for 1.0 seconds...
> 2015-11-24 19:22:38.358 17770 INFO oslo.messaging._drivers.impl_rabbit [req-24c4ce2a-6516-401e-a6f8-8a89758cc6fa ] Connecting to AMQP server on controller:5672
> 2015-11-24 19:22:38.379 17770 INFO oslo.messaging._drivers.impl_rabbit [req-24c4ce2a-6516-401e-a6f8-8a89758cc6fa ] Connected to AMQP server on controller:5672
> 2015-11-24 19:23:38.385 17770 ERROR nova.scheduler.driver [req-24c4ce2a-6516-401e-a6f8-8a89758cc6fa None] Exception during scheduler.run_instance
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver Traceback (most recent call last):
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 616, in build_instances
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     request_spec, filter_properties)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 49, in select_destinations
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     context, request_spec, filter_properties)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 35, in __run_method
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     return getattr(self.instance, __name)(*args, **kwargs)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 34, in select_destinations
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     context, request_spec, filter_properties)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 108, in select_destinations
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     request_spec=request_spec, filter_properties=filter_properties)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/oslo/messaging/rpc/client.py", line 152, in call
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     retry=self.retry)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/oslo/messaging/transport.py", line 90, in _send
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     timeout=timeout, retry=retry)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 408, in send
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     retry=retry)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 397, in _send
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     result = self._waiter.wait(msg_id, timeout)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 285, in wait
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     reply, ending = self._poll_connection(msg_id, timeout)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver   File "/usr/lib/python2.7/site-packages/oslo/messaging/_drivers/amqpdriver.py", line 235, in _poll_connection
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver     % msg_id)
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver MessagingTimeout: Timed out waiting for a reply to message ID d015995266a24e05b21ff0cd2ff3c6bc
> 2015-11-24 19:23:38.385 17770 TRACE nova.scheduler.driver 
> 2015-11-24 19:23:38.388 17770 WARNING nova.scheduler.driver [req-24c4ce2a-6516-401e-a6f8-8a89758cc6fa None] [instance: 66147c0d-fa53-4e2b-8447-d4f20ac7bcbb] Setting instance to ERROR state.
> 
> controller# systemctl status rabbitmq-server
> rabbitmq-server.service - RabbitMQ broker
>    Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled)
>    Active: active (running) since Tue 2015-11-24 15:02:41 CST; 4h 43min ago
>   Process: 137583 ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl stop (code=exited, status=0/SUCCESS)
>  Main PID: 137639 (beam.smp)
>    CGroup: /system.slice/rabbitmq-server.service
>            ├─137639 /usr/lib64/erlang/erts-5.10.4/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_s...
>            ├─137654 /usr/lib64/erlang/erts-5.10.4/bin/epmd -daemon
>            ├─137767 inet_gethost 4
>            └─137768 inet_gethost 4
> 
> Nov 24 15:02:40 controller systemd[1]: rabbitmq-server.service: Got notification message from PID 137701, but reception only permitted for PID 137639
> Nov 24 15:02:40 controller systemd[1]: rabbitmq-server.service: Got notification message from PID 137702, but reception only permitted for PID 137639
> Nov 24 15:02:41 controller rabbitmq-server[137639]: RabbitMQ 3.3.5. Copyright (C) 2007-2014 GoPivotal, Inc.
> Nov 24 15:02:41 controller rabbitmq-server[137639]: ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/ <http://www.rabbitmq.com/>
> Nov 24 15:02:41 controller rabbitmq-server[137639]: ##  ##
> Nov 24 15:02:41 controller rabbitmq-server[137639]: ##########  Logs: /var/log/rabbitmq/rabbit at controller.log <mailto:/var/log/rabbitmq/rabbit at controller.log>
> Nov 24 15:02:41 controller rabbitmq-server[137639]: ######  ##        /var/log/rabbitmq/rabbit at controller-sasl.log <mailto:/var/log/rabbitmq/rabbit at controller-sasl.log>
> Nov 24 15:02:41 controller rabbitmq-server[137639]: ##########
> Nov 24 15:02:41 controller rabbitmq-server[137639]: Starting broker... completed with 6 plugins.
> Nov 24 15:02:41 controller systemd[1]: Started RabbitMQ broker.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20151125/a9f4acbe/attachment.html>


More information about the OpenStack-operators mailing list