Neutron metadata service not responding due to rabbitmq dropping messages
Grant Morley
grant at civo.com
Mon Feb 24 18:07:45 UTC 2020
Hi all,
We have recently come across an issue where our metadata service stops
responding. If you try to curl the service from within an instance you get:
% curl http://169.254.169.254
<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
After doing some digging around on our neutron nodes I noticed we were
getting loads of RabbitMQ timeout errors whilst trying to process
message requests:
2020-02-24 07:28:09.747 26378 ERROR neutron.common.rpc [-] Timeout in
RPC method get_ports. Waiting for 26 seconds before next attempt. If the
server is not down, consider increasing the rpc_response_timeout option
as Neutron server(s) may be overloaded and unable to respond quickly
enough.: MessagingTimeout: Timed out waiting for a reply to message ID
a14c4a1395864cd980c1ec563a5c48aa
The servers are fairly busy, however we do not have a massive
installation >1500 instances and roughly 850 routers.
However if I restart the "neutron-metadata-agent" service and the
"neutron-server" service it seems to fix the issue for a while but
ultimately it comes back.
I did increase the "rpc_timeout" on the netutron nodes to 120 seconds
but that seems quite long to me.
Likewise the RabbitMQ servers are not overly busy, we seem to get a
constant stream of only 40+ messages in the queue at one time and that
can spike depending on workload.
Does anyone know of any tuning or tweaking we can do to the metadata
service in either Neutron or Nova that might help?
We are running OpenStack Queens if that helps.
Many thanks,
Grant
More information about the openstack-discuss
mailing list