[Openstack] olso-messaging times out after reconnecting to rabbit

Gordon Sim gsim at redhat.com
Tue Jul 8 11:30:54 UTC 2014


On 07/08/2014 02:00 AM, Noel Burton-Krahn wrote:
> The thing is, that produces errors exactly like what I'm seeing in nova
> if rabbit dies and we reconnect to a new rabbit instance.

A call timing out while waiting for a response is a fairly general 
problem for which there could be different causes.

>  I'm tracing
> through the nova calls in the rabbit reconnect case to confirm that
> acknowledge is always being called when it should be.

Even if it is, the acknowledgement could be lost if the connection to 
rabbitmq fails. However I don't think that is likely to be the cause of 
the time out. Unlike in the example, in a real oslo.messaging based 
service the fact that the request is redelivered shouldn't be a problem. 
The reply issued to it may be ignored or dropped, but the subsequent 
requests will be processed.

I'm not completely clear on what the timing is in your original problem. 
You say the timeout happens after a restart. Is it immediately after 
(i.e. could some connections still be detecting the failure)? Or long 
enough after that you are confident everything has failed over correctly?

(Obviously a failure or restart *during* a call may well result in a 
timeout; that is the expected semantics at present).




More information about the Openstack mailing list