[openstack-dev] how to debug RPC timeout issues?

Chris Friesen chris.friesen at windriver.com
Tue Sep 16 21:23:30 UTC 2014


Hi,

I'm running Havana, and I just tried a testcase involving doing six simultaneous live-migrations.

It appears that the migrations succeeded, but two of the instances got stuck with a status of "MIGRATING" because of RPC timeouts:

2014-09-16 20:35:07.376 12493 INFO nova.notifier [-] processing ERROR _post_live_migration
2014-09-16 20:35:07.390 12493 INFO nova.openstack.common.rpc.common [-] Connected to AMQP server on 192.168.204.2:5672
2014-09-16 20:35:07.396 12493 ERROR nova.openstack.common.loopingcall [-] in fixed duration looping call
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall Traceback (most recent call last):
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/openstack/common/loopingcall.py", line 78, in _inner
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4874, in wait_for_live_migration
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/exception.py", line 90, in wrapped
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/exception.py", line 73, in wrapped
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/compute/manager.py", line 4558, in _post_live_migration
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/compute/rpcapi.py", line 517, in post_live_migration_at_destination
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/rpcclient.py", line 85, in call
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/rpcclient.py", line 63, in _invoke
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall   File "./usr/lib64/python2.7/site-packages/nova/openstack/common/rpc/proxy.py", line 131, in call
2014-09-16 20:35:07.396 12493 TRACE nova.openstack.common.loopingcall Timeout: Timeout while waiting on RPC response - topic: "compute.compute-0", RPC method: "post_live_migration_at_destination" info: "<unknown>"


Looking at the migration destination compute node, I see it in turn getting stuck on an RPC timeout:


2014-09-16 20:35:32.216 12510 INFO nova.notifier [req-a8389c8d-7e5b-4f08-8669-ce14594e3863 24a43342a5ae4e31bd431aef2d395ebc 02bf771ab40f4ecea6fe42135c5a09bc] processing ERROR post_live_migration_at_destination
2014-09-16 20:35:32.247 12510 INFO nova.openstack.common.rpc.common [req-a8389c8d-7e5b-4f08-8669-ce14594e3863 24a43342a5ae4e31bd431aef2d395ebc 02bf771ab40f4ecea6fe42135c5a09bc] Connected to AMQP server on 192.168.204.2:5672
2014-09-16 20:35:32.248 12510 INFO nova.openstack.common.rpc.common [req-2f7bd481-9aed-4d66-97dd-3075407282dd 24a43342a5ae4e31bd431aef2d395ebc 02bf771ab40f4ecea6fe42135c5a09bc] Connected to AMQP server on 192.168.204.2:5672
2014-09-16 20:35:32.270 12510 ERROR nova.openstack.common.rpc.amqp [req-a8389c8d-7e5b-4f08-8669-ce14594e3863 24a43342a5ae4e31bd431aef2d395ebc 02bf771ab40f4ecea6fe42135c5a09bc] Exception during message handling
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/openstack/common/rpc/amqp.py", line 466, in _process_data
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/exception.py", line 90, in wrapped
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/exception.py", line 73, in wrapped
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/compute/manager.py", line 4647, in post_live_migration_at_destination
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5065, in post_live_migration_at_destination
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3709, in to_xml
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3465, in get_guest_config
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/compute/manager.py", line 383, in instance_update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/compute/manager.py", line 498, in _instance_update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/openstack/common/lockutils.py", line 246, in inner
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/compute/resource_tracker.py", line 601, in update_usage
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/compute/resource_tracker.py", line 827, in _update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/conductor/api.py", line 263, in compute_node_update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/conductor/rpcapi.py", line 386, in compute_node_update
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/rpcclient.py", line 85, in call
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/rpcclient.py", line 63, in _invoke
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp   File "./usr/lib64/python2.7/site-packages/nova/openstack/common/rpc/proxy.py", line 131, in call
2014-09-16 20:35:32.270 12510 TRACE nova.openstack.common.rpc.amqp Timeout: Timeout while waiting on RPC response - topic: "conductor", RPC method: "compute_node_update" info: "<unknown>"


Looking at nova-conductor, there's nothing interesting in the logs around this time.


Looking at the rabbitmq logs at /var/log/rabbitmq/rabbit at localhost.log, there's nothing particularly interesting.
It didn't do a switchover or restart or anything like that, I just see messages like this:

=INFO REPORT==== 16-Sep-2014::20:34:37 ===
closing AMQP connection <0.19196.1> (192.168.204.208:59984 -> 192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:34:38 ===
accepting AMQP connection <0.10878.2> (192.168.204.208:33566 -> 192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:34:38 ===
accepting AMQP connection <0.10888.2> (192.168.204.208:33567 -> 192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:35:07 ===
closing AMQP connection <0.10798.2> (192.168.204.136:49498 -> 192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:35:07 ===
closing AMQP connection <0.10788.2> (192.168.204.136:49497 -> 192.168.204.2:5672)

=INFO REPORT==== 16-Sep-2014::20:35:07 ===
accepting AMQP connection <0.11364.2> (192.168.204.136:49538 -> 192.168.204.2:5672)


So...how do I go about figuring out what's causing these RPC timeouts?


Thanks,
Chris



More information about the OpenStack-dev mailing list