Hi all, We have recently come across an issue where our metadata service stops responding. If you try to curl the service from within an instance you get: % curl http://169.254.169.254 <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> After doing some digging around on our neutron nodes I noticed we were getting loads of RabbitMQ timeout errors whilst trying to process message requests: 2020-02-24 07:28:09.747 26378 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 26 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID a14c4a1395864cd980c1ec563a5c48aa The servers are fairly busy, however we do not have a massive installation >1500 instances and roughly 850 routers. However if I restart the "neutron-metadata-agent" service and the "neutron-server" service it seems to fix the issue for a while but ultimately it comes back. I did increase the "rpc_timeout" on the netutron nodes to 120 seconds but that seems quite long to me. Likewise the RabbitMQ servers are not overly busy, we seem to get a constant stream of only 40+ messages in the queue at one time and that can spike depending on workload. Does anyone know of any tuning or tweaking we can do to the metadata service in either Neutron or Nova that might help? We are running OpenStack Queens if that helps. Many thanks, Grant