[Openstack] olso-messaging times out after reconnecting to rabbit

Gordon Sim gsim at redhat.com
Mon Jul 7 22:43:06 UTC 2014


On 07/06/2014 01:02 AM, Noel Burton-Krahn wrote:
> Icehouse
> oslo-messaging 1.3.0
> rabbitmq-server 3.1.3
>
> We've noticed that nova rpc calls fail often after rabbit restarts.
>   I've tracked it down to oslo/rabbit/kombu timing out if it's forced to
> reconnect to rabbit.  The code below times out waiting for a reply if
> the topic has been used in a previous run.  The reply always arrives the
> first time a topic is used, or if the topic is none.  But, the second
> run with the same topic will hang with this error:
>
>     MessagingTimeout: Timed out waiting for a reply to message ID ...
>
>
> This problem seems too basic to not be caught earlier in oslo, but the
> program below does really reproduce the same symptoms we see in nova
> when run against a live rabbit server.  What's wrong with this picture?

Just a theory, but could the issue with the simple example be the following:

* the same queue is used for the first and second run
* the first request is not acknowledged so when the first test exits its 
left on the queue
* on the second attempt, you retrieve the same first request, whose 
reply-to address is no longer valid so the reply is never delivered
* you then try to join the sender thread without pulling off another 
message, so you don't get to the second request

Just a theory as I say. Also doesn't explain the actual issue as you 
observed with nova. Its just a property of this example.

--Gordon.

> Cheers
> --
> Noel
>
>
> #! /usr/bin/python
>
> from oslo.config import cfg
> import threading
> from oslo import messaging
> import logging
> import time
> log = logging.getLogger(__name__)
>
> class OsloTest():
>      def test(self):
>          # The code below times out waiting for a reply if the topic
>          # has been used in a previous run.  The reply always arrives
>          # the first time a topic is used, or if the topic is none.
>          # But, the second run with the same topic will hang with this
>          # error:
>          #
>          # MessagingTimeout: Timed out waiting for a reply to message ID ...
>          #
>          topic  = 'will_hang_on_second_usage'
>          #topic  = None # never hangs
>
>          url = "%(proto)s://%(user)s:%(password)s@%(host)s/" % dict(
>              proto = 'rabbit',
>              host = '1.2.3.4',
>              password = 'xxxxxxxx',
>              user = 'rabbit-mq-user',
>              )
>          transport = messaging.get_transport(cfg.CONF, url)
>          driver = transport._driver
>
>          target = messaging.Target(topic=topic)
>          listener = driver.listen(target)
>          ctxt={"context": True}
>          timeout = 10
>
>          def send_main():
>              log.debug("sending msg")
>              reply = driver.send(target,
>                                  ctxt,
>                                  {'send': 1},
>                                  wait_for_reply=True,
>                                  timeout=timeout)
>
>              # times out if topic was not None and used before
>              log.debug("received reply=%r" % (reply,))
>
>          send_thread = threading.Thread(target=send_main)
>          send_thread.daemon = True
>          send_thread.start()
>
>          msg = listener.poll()
>          log.debug("received msg=%r" % (msg,))
>
>          msg.reply({'reply': 1})
>
>          log.debug("sent reply")
>
>          send_thread.join()
>
> if __name__ == '__main__':
>      FORMAT = '%(asctime)-15s %(process)5d %(thread)5d %(filename)s
> %(funcName)s %(message)s'
>      logging.basicConfig(level=logging.DEBUG, format=FORMAT)
>      OsloTest().test()
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>





More information about the Openstack mailing list