[Openstack] olso-messaging times out after reconnecting to rabbit
Noel Burton-Krahn
noel at pistoncloud.com
Tue Jul 8 01:00:45 UTC 2014
Hi Gordon thanks for your reply.
That was exactly the problem with my example: no acknowledge meant the old
messages were stuck in the queue, leading to no rpc reply. I created my
test program from oslo.messaging/tests/test_rabbit.py, which didn't have
any calls to acknowledge(). The thing is, that produces errors exactly
like what I'm seeing in nova if rabbit dies and we reconnect to a new
rabbit instance. I'm tracing through the nova calls in the rabbit
reconnect case to confirm that acknowledge is always being called when it
should be.
Cheers,
--
Noel
On Mon, Jul 7, 2014 at 3:43 PM, Gordon Sim <gsim at redhat.com> wrote:
> On 07/06/2014 01:02 AM, Noel Burton-Krahn wrote:
>
>> Icehouse
>> oslo-messaging 1.3.0
>> rabbitmq-server 3.1.3
>>
>> We've noticed that nova rpc calls fail often after rabbit restarts.
>> I've tracked it down to oslo/rabbit/kombu timing out if it's forced to
>> reconnect to rabbit. The code below times out waiting for a reply if
>> the topic has been used in a previous run. The reply always arrives the
>> first time a topic is used, or if the topic is none. But, the second
>> run with the same topic will hang with this error:
>>
>> MessagingTimeout: Timed out waiting for a reply to message ID ...
>>
>>
>> This problem seems too basic to not be caught earlier in oslo, but the
>> program below does really reproduce the same symptoms we see in nova
>> when run against a live rabbit server. What's wrong with this picture?
>>
>
> Just a theory, but could the issue with the simple example be the
> following:
>
> * the same queue is used for the first and second run
> * the first request is not acknowledged so when the first test exits its
> left on the queue
> * on the second attempt, you retrieve the same first request, whose
> reply-to address is no longer valid so the reply is never delivered
> * you then try to join the sender thread without pulling off another
> message, so you don't get to the second request
>
> Just a theory as I say. Also doesn't explain the actual issue as you
> observed with nova. Its just a property of this example.
>
> --Gordon.
>
> Cheers
>> --
>> Noel
>>
>>
>> #! /usr/bin/python
>>
>> from oslo.config import cfg
>> import threading
>> from oslo import messaging
>> import logging
>> import time
>> log = logging.getLogger(__name__)
>>
>> class OsloTest():
>> def test(self):
>> # The code below times out waiting for a reply if the topic
>> # has been used in a previous run. The reply always arrives
>> # the first time a topic is used, or if the topic is none.
>> # But, the second run with the same topic will hang with this
>> # error:
>> #
>> # MessagingTimeout: Timed out waiting for a reply to message ID
>> ...
>> #
>> topic = 'will_hang_on_second_usage'
>> #topic = None # never hangs
>>
>> url = "%(proto)s://%(user)s:%(password)s@%(host)s/" % dict(
>> proto = 'rabbit',
>> host = '1.2.3.4',
>> password = 'xxxxxxxx',
>> user = 'rabbit-mq-user',
>> )
>> transport = messaging.get_transport(cfg.CONF, url)
>> driver = transport._driver
>>
>> target = messaging.Target(topic=topic)
>> listener = driver.listen(target)
>> ctxt={"context": True}
>> timeout = 10
>>
>> def send_main():
>> log.debug("sending msg")
>> reply = driver.send(target,
>> ctxt,
>> {'send': 1},
>> wait_for_reply=True,
>> timeout=timeout)
>>
>> # times out if topic was not None and used before
>> log.debug("received reply=%r" % (reply,))
>>
>> send_thread = threading.Thread(target=send_main)
>> send_thread.daemon = True
>> send_thread.start()
>>
>> msg = listener.poll()
>> log.debug("received msg=%r" % (msg,))
>>
>> msg.reply({'reply': 1})
>>
>> log.debug("sent reply")
>>
>> send_thread.join()
>>
>> if __name__ == '__main__':
>> FORMAT = '%(asctime)-15s %(process)5d %(thread)5d %(filename)s
>> %(funcName)s %(message)s'
>> logging.basicConfig(level=logging.DEBUG, format=FORMAT)
>> OsloTest().test()
>>
>>
>> _______________________________________________
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
>> openstack
>> Post to : openstack at lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
>> openstack
>>
>>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140707/0231acff/attachment.html>
More information about the Openstack
mailing list