Slow instance launch times due to RabbitMQ
Hi all, We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron: ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671 is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing. We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment. Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens? Many Thanks, -- Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
What distro/deployer are you using? Donny Davis c: 805 814 6800 On Wed, Jul 31, 2019, 5:17 AM Grant Morley <grant@civo.com> wrote:
Hi all,
We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron:
ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671 is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed
The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing.
We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment.
Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens?
Many Thanks, --
Grant Morley Cloud Lead, Civo Ltd www.civo.com | Signup for an account! <https://www.civo.com/signup>
Could you forward the output of the following commands on a controller node? : rabbitmqctl cluster_status rabbitmqctl list_queues You won't necessarily see a high load on a Rabbit cluster that is in a bad state. On Wed, Jul 31, 2019 at 5:19 AM Grant Morley <grant@civo.com> wrote:
Hi all,
We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron:
ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671 is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed
The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing.
We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment.
Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens?
Many Thanks, --
Grant Morley Cloud Lead, Civo Ltd www.civo.com | Signup for an account! <https://www.civo.com/signup>
Hi guys, We are using Ubuntu 16 and OpenStack ansible to do our setup. rabbitmqctl list_queues Listing queues (Doesn't appear to be any queues ) rabbitmqctl cluster_status Cluster status of node 'rabbit@management-1-rabbit-mq-container-b4d7791f' [{nodes,[{disc,['rabbit@management-1-rabbit-mq-container-b4d7791f', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-3-rabbit-mq-container-1d6ae377']}]}, {running_nodes,['rabbit@management-3-rabbit-mq-container-1d6ae377', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-1-rabbit-mq-container-b4d7791f']}, {cluster_name,<<"openstack">>}, {partitions,[]}, {alarms,[{'rabbit@management-3-rabbit-mq-container-1d6ae377',[]}, {'rabbit@management-2-rabbit-mq-container-b455e77d',[]}, {'rabbit@management-1-rabbit-mq-container-b4d7791f',[]}]}] Regards, On 31/07/2019 11:49, Laurent Dumont wrote:
Could you forward the output of the following commands on a controller node? :
rabbitmqctl cluster_status rabbitmqctl list_queues
You won't necessarily see a high load on a Rabbit cluster that is in a bad state.
On Wed, Jul 31, 2019 at 5:19 AM Grant Morley <grant@civo.com <mailto:grant@civo.com>> wrote:
Hi all,
We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron:
ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671 <http://10.6.2.212:5671> is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed
The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing.
We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment.
Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens?
Many Thanks,
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
-- Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
That is a bit strange, list_queues should return stuff. Couple of ideas : - Are the Rabbit connection failure logs on the compute pointing to a specific controller? - Are there any logs within Rabbit on the controller that would point to a transient issue? - cluster_status is a snapshot of the cluster at the time you ran the command. If the alarms have cleared, you won't see anything. - If you have the RabbitMQ management plugin activated, I would recommend a quick look to see the historical metrics and overall status. On Wed, Jul 31, 2019 at 9:35 AM Grant Morley <grant@civo.com> wrote:
Hi guys,
We are using Ubuntu 16 and OpenStack ansible to do our setup.
rabbitmqctl list_queues Listing queues
(Doesn't appear to be any queues )
rabbitmqctl cluster_status
Cluster status of node 'rabbit@management-1-rabbit-mq-container-b4d7791f' [{nodes,[{disc,['rabbit@management-1-rabbit-mq-container-b4d7791f', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-3-rabbit-mq-container-1d6ae377']}]}, {running_nodes,['rabbit@management-3-rabbit-mq-container-1d6ae377', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-1-rabbit-mq-container-b4d7791f']}, {cluster_name,<<"openstack">>}, {partitions,[]}, {alarms,[{'rabbit@management-3-rabbit-mq-container-1d6ae377',[]}, {'rabbit@management-2-rabbit-mq-container-b455e77d',[]}, {'rabbit@management-1-rabbit-mq-container-b4d7791f',[]}]}]
Regards, On 31/07/2019 11:49, Laurent Dumont wrote:
Could you forward the output of the following commands on a controller node? :
rabbitmqctl cluster_status rabbitmqctl list_queues
You won't necessarily see a high load on a Rabbit cluster that is in a bad state.
On Wed, Jul 31, 2019 at 5:19 AM Grant Morley <grant@civo.com> wrote:
Hi all,
We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron:
ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671 is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed
The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing.
We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment.
Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens?
Many Thanks, --
Grant Morley Cloud Lead, Civo Ltd www.civo.com | Signup for an account! <https://www.civo.com/signup>
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com | Signup for an account! <https://www.civo.com/signup>
Hi, Are you using ssl connections ? Can be this issue ? https://bugs.launchpad.net/ubuntu/+source/oslo.messaging/+bug/1800957 ________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: Wednesday, July 31, 2019 4:20 PM To: Grant Morley Cc: openstack-operators@lists.openstack.org Subject: Re: Slow instance launch times due to RabbitMQ That is a bit strange, list_queues should return stuff. Couple of ideas : * Are the Rabbit connection failure logs on the compute pointing to a specific controller? * Are there any logs within Rabbit on the controller that would point to a transient issue? * cluster_status is a snapshot of the cluster at the time you ran the command. If the alarms have cleared, you won't see anything. * If you have the RabbitMQ management plugin activated, I would recommend a quick look to see the historical metrics and overall status. On Wed, Jul 31, 2019 at 9:35 AM Grant Morley <grant@civo.com<mailto:grant@civo.com>> wrote: Hi guys, We are using Ubuntu 16 and OpenStack ansible to do our setup. rabbitmqctl list_queues Listing queues (Doesn't appear to be any queues ) rabbitmqctl cluster_status Cluster status of node 'rabbit@management-1-rabbit-mq-container-b4d7791f' [{nodes,[{disc,['rabbit@management-1-rabbit-mq-container-b4d7791f', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-3-rabbit-mq-container-1d6ae377']}]}, {running_nodes,['rabbit@management-3-rabbit-mq-container-1d6ae377', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-1-rabbit-mq-container-b4d7791f']}, {cluster_name,<<"openstack">>}, {partitions,[]}, {alarms,[{'rabbit@management-3-rabbit-mq-container-1d6ae377',[]}, {'rabbit@management-2-rabbit-mq-container-b455e77d',[]}, {'rabbit@management-1-rabbit-mq-container-b4d7791f',[]}]}] Regards, On 31/07/2019 11:49, Laurent Dumont wrote: Could you forward the output of the following commands on a controller node? : rabbitmqctl cluster_status rabbitmqctl list_queues You won't necessarily see a high load on a Rabbit cluster that is in a bad state. On Wed, Jul 31, 2019 at 5:19 AM Grant Morley <grant@civo.com<mailto:grant@civo.com>> wrote: Hi all, We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron: ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671<http://10.6.2.212:5671> is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing. We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment. Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens? Many Thanks, -- [https://www.civo.com/images/email-logo.jpg] Grant Morley Cloud Lead, Civo Ltd www.civo.com<https://www.civo.com/> | Signup for an account!<https://www.civo.com/signup> -- [https://www.civo.com/images/email-logo.jpg] Grant Morley Cloud Lead, Civo Ltd www.civo.com<https://www.civo.com/> | Signup for an account!<https://www.civo.com/signup>
Another thing to check if you're having seemingly inexplicable messaging issues is that there isn't a notification queue filling up somewhere. If notifications are enabled somewhere but nothing is consuming them the size of the queue will eventually grind rabbit to a halt. I used to check queue sizes through the rabbit web ui, so I have to admit I'm not sure how to do it through the cli. On 7/31/19 10:48 AM, Gabriele Santomaggio wrote:
Hi, Are you using ssl connections ?
Can be this issue ? https://bugs.launchpad.net/ubuntu/+source/oslo.messaging/+bug/1800957
------------------------------------------------------------------------ *From:* Laurent Dumont <laurentfdumont@gmail.com> *Sent:* Wednesday, July 31, 2019 4:20 PM *To:* Grant Morley *Cc:* openstack-operators@lists.openstack.org *Subject:* Re: Slow instance launch times due to RabbitMQ That is a bit strange, list_queues should return stuff. Couple of ideas :
* Are the Rabbit connection failure logs on the compute pointing to a specific controller? * Are there any logs within Rabbit on the controller that would point to a transient issue? * cluster_status is a snapshot of the cluster at the time you ran the command. If the alarms have cleared, you won't see anything. * If you have the RabbitMQ management plugin activated, I would recommend a quick look to see the historical metrics and overall status.
On Wed, Jul 31, 2019 at 9:35 AM Grant Morley <grant@civo.com <mailto:grant@civo.com>> wrote:
Hi guys,
We are using Ubuntu 16 and OpenStack ansible to do our setup.
rabbitmqctl list_queues Listing queues
(Doesn't appear to be any queues )
rabbitmqctl cluster_status
Cluster status of node 'rabbit@management-1-rabbit-mq-container-b4d7791f' [{nodes,[{disc,['rabbit@management-1-rabbit-mq-container-b4d7791f', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-3-rabbit-mq-container-1d6ae377']}]}, {running_nodes,['rabbit@management-3-rabbit-mq-container-1d6ae377', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-1-rabbit-mq-container-b4d7791f']}, {cluster_name,<<"openstack">>}, {partitions,[]}, {alarms,[{'rabbit@management-3-rabbit-mq-container-1d6ae377',[]}, {'rabbit@management-2-rabbit-mq-container-b455e77d',[]}, {'rabbit@management-1-rabbit-mq-container-b4d7791f',[]}]}]
Regards,
On 31/07/2019 11:49, Laurent Dumont wrote:
Could you forward the output of the following commands on a controller node? :
rabbitmqctl cluster_status rabbitmqctl list_queues
You won't necessarily see a high load on a Rabbit cluster that is in a bad state.
On Wed, Jul 31, 2019 at 5:19 AM Grant Morley <grant@civo.com <mailto:grant@civo.com>> wrote:
Hi all,
We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron:
ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671 <http://10.6.2.212:5671> is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed
The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing.
We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment.
Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens?
Many Thanks,
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
Le mar. 6 août 2019 à 17:14, Ben Nemec <openstack@nemebean.com> a écrit :
Another thing to check if you're having seemingly inexplicable messaging issues is that there isn't a notification queue filling up somewhere. If notifications are enabled somewhere but nothing is consuming them the size of the queue will eventually grind rabbit to a halt.
I used to check queue sizes through the rabbit web ui, so I have to admit I'm not sure how to do it through the cli.
You can use the following command to monitor your queues and observe size and growing: ``` watch -c "rabbitmqctl list_queues name messages_unacknowledged" ``` Or also something like that: ``` rabbitmqctl list_queues messages consumers name message_bytes messages_unacknowledged > messages_ready head_message_timestamp consumer_utilisation memory state | grep reply ```
On 7/31/19 10:48 AM, Gabriele Santomaggio wrote:
Hi, Are you using ssl connections ?
Can be this issue ? https://bugs.launchpad.net/ubuntu/+source/oslo.messaging/+bug/1800957
------------------------------------------------------------------------ *From:* Laurent Dumont <laurentfdumont@gmail.com> *Sent:* Wednesday, July 31, 2019 4:20 PM *To:* Grant Morley *Cc:* openstack-operators@lists.openstack.org *Subject:* Re: Slow instance launch times due to RabbitMQ That is a bit strange, list_queues should return stuff. Couple of ideas :
* Are the Rabbit connection failure logs on the compute pointing to a specific controller? * Are there any logs within Rabbit on the controller that would point to a transient issue? * cluster_status is a snapshot of the cluster at the time you ran the command. If the alarms have cleared, you won't see anything. * If you have the RabbitMQ management plugin activated, I would recommend a quick look to see the historical metrics and overall status.
On Wed, Jul 31, 2019 at 9:35 AM Grant Morley <grant@civo.com <mailto:grant@civo.com>> wrote:
Hi guys,
We are using Ubuntu 16 and OpenStack ansible to do our setup.
rabbitmqctl list_queues Listing queues
(Doesn't appear to be any queues )
rabbitmqctl cluster_status
Cluster status of node 'rabbit@management-1-rabbit-mq-container-b4d7791f' [{nodes,[{disc,['rabbit@management-1-rabbit-mq-container-b4d7791f', 'rabbit@management-2-rabbit-mq-container-b455e77d', 'rabbit@management-3-rabbit-mq-container-1d6ae377 ']}]}, {running_nodes,['rabbit@management-3-rabbit-mq-container-1d6ae377 ', 'rabbit@management-2-rabbit-mq-container-b455e77d ', 'rabbit@management-1-rabbit-mq-container-b4d7791f ']}, {cluster_name,<<"openstack">>}, {partitions,[]}, {alarms,[{'rabbit@management-3-rabbit-mq-container-1d6ae377',[]}, {'rabbit@management-2-rabbit-mq-container-b455e77d',[]}, {'rabbit@management-1-rabbit-mq-container-b4d7791f ',[]}]}]
Regards,
On 31/07/2019 11:49, Laurent Dumont wrote:
Could you forward the output of the following commands on a controller node? :
rabbitmqctl cluster_status rabbitmqctl list_queues
You won't necessarily see a high load on a Rabbit cluster that is in a bad state.
On Wed, Jul 31, 2019 at 5:19 AM Grant Morley <grant@civo.com <mailto:grant@civo.com>> wrote:
Hi all,
We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron:
ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671 <http://10.6.2.212:5671> is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed
The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing.
We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment.
Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens?
Many Thanks,
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
-- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
I am curious how your system is setup? Are you using nova with local storage? Are you using ceph? How long does it take to launch an instance when you are seeing this message? On Wed, Aug 7, 2019 at 11:12 AM Herve Beraud <hberaud@redhat.com> wrote:
Le mar. 6 août 2019 à 17:14, Ben Nemec <openstack@nemebean.com> a écrit :
Another thing to check if you're having seemingly inexplicable messaging issues is that there isn't a notification queue filling up somewhere. If notifications are enabled somewhere but nothing is consuming them the size of the queue will eventually grind rabbit to a halt.
I used to check queue sizes through the rabbit web ui, so I have to admit I'm not sure how to do it through the cli.
You can use the following command to monitor your queues and observe size and growing:
``` watch -c "rabbitmqctl list_queues name messages_unacknowledged" ```
Or also something like that:
``` rabbitmqctl list_queues messages consumers name message_bytes messages_unacknowledged > messages_ready head_message_timestamp consumer_utilisation memory state | grep reply ```
On 7/31/19 10:48 AM, Gabriele Santomaggio wrote:
Hi, Are you using ssl connections ?
Can be this issue ? https://bugs.launchpad.net/ubuntu/+source/oslo.messaging/+bug/1800957
------------------------------------------------------------------------ *From:* Laurent Dumont <laurentfdumont@gmail.com> *Sent:* Wednesday, July 31, 2019 4:20 PM *To:* Grant Morley *Cc:* openstack-operators@lists.openstack.org *Subject:* Re: Slow instance launch times due to RabbitMQ That is a bit strange, list_queues should return stuff. Couple of ideas :
* Are the Rabbit connection failure logs on the compute pointing to a specific controller? * Are there any logs within Rabbit on the controller that would point to a transient issue? * cluster_status is a snapshot of the cluster at the time you ran the command. If the alarms have cleared, you won't see anything. * If you have the RabbitMQ management plugin activated, I would recommend a quick look to see the historical metrics and overall status.
On Wed, Jul 31, 2019 at 9:35 AM Grant Morley <grant@civo.com <mailto:grant@civo.com>> wrote:
Hi guys,
We are using Ubuntu 16 and OpenStack ansible to do our setup.
rabbitmqctl list_queues Listing queues
(Doesn't appear to be any queues )
rabbitmqctl cluster_status
Cluster status of node 'rabbit@management-1-rabbit-mq-container-b4d7791f' [{nodes,[{disc,['rabbit@management-1-rabbit-mq-container-b4d7791f', 'rabbit@management-2-rabbit-mq-container-b455e77d ', 'rabbit@management-3-rabbit-mq-container-1d6ae377 ']}]}, {running_nodes,['rabbit@management-3-rabbit-mq-container-1d6ae377 ', 'rabbit@management-2-rabbit-mq-container-b455e77d ', 'rabbit@management-1-rabbit-mq-container-b4d7791f ']}, {cluster_name,<<"openstack">>}, {partitions,[]}, {alarms,[{'rabbit@management-3-rabbit-mq-container-1d6ae377',[]}, {'rabbit@management-2-rabbit-mq-container-b455e77d',[]}, {'rabbit@management-1-rabbit-mq-container-b4d7791f ',[]}]}]
Regards,
On 31/07/2019 11:49, Laurent Dumont wrote:
Could you forward the output of the following commands on a controller node? :
rabbitmqctl cluster_status rabbitmqctl list_queues
You won't necessarily see a high load on a Rabbit cluster that is in a bad state.
On Wed, Jul 31, 2019 at 5:19 AM Grant Morley <grant@civo.com <mailto:grant@civo.com>> wrote:
Hi all,
We are randomly seeing slow instance launch / deletion times and it appears to be because of RabbitMQ. We are seeing a lot of these messages in the logs for Nova and Neutron:
ERROR oslo.messaging._drivers.impl_rabbit [-] [f4ab3ca0-b837-4962-95ef-dfd7d60686b6] AMQP server on 10.6.2.212:5671 <http://10.6.2.212:5671> is unreachable: Too many heartbeats missed. Trying again in 1 seconds. Client port: 37098: ConnectionForced: Too many heartbeats missed
The RabbitMQ cluster isn't under high load and I am not seeing any packets drop over the network when I do some tracing.
We are only running 15 compute nodes currently and have >1000 instances so it isn't a large deployment.
Are there any good configuration tweaks for RabbitMQ running on OpenStack Queens?
Many Thanks,
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
--
Grant Morley Cloud Lead, Civo Ltd www.civo.com <https://www.civo.com/>| Signup for an account! <https://www.civo.com/signup>
-- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
On 8/6/19 5:10 PM, Ben Nemec wrote:
Another thing to check if you're having seemingly inexplicable messaging issues is that there isn't a notification queue filling up somewhere. If notifications are enabled somewhere but nothing is consuming them the size of the queue will eventually grind rabbit to a halt.
I used to check queue sizes through the rabbit web ui, so I have to admit I'm not sure how to do it through the cli.
On the cli. Purging Rabbit notification queues: rabbitmqctl purge_queue versioned_notifications.info rabbitmqctl purge_queue notifications.info Getting the total number of messages in Rabbit: NUM_MESSAGE=$(curl -k -uuser:pass https://192.168.0.1:15671/api/overview 2>/dev/null | jq '.["queue_totals"]["messages"]') The same way, you can get a json output of all queues using this URL: https://192.168.0.1:15671/api/queues and playing with jq, you can do many things like: jq '.[] | select(.name == "versioned_notifications.info") | .messages' jq '.[] | select(.name == "notifications.info") | .messages' jq '.[] | select(.name == "versioned_notifications.error") | .messages' jq '.[] | select(.name == "notifications.error") | .messages' If sum add the output of all of the above 4 queues, you get the total number of notification messages. What I did is outputing to graphite like this: echo "`hostname`.rabbitmq.notifications ${NUM_TOTAL_NOTIF} `date +%s`" \ | nc -w 2 graphite-node-hostname 2003 for the amount of notif + the other types of messages. Doing this every minute makes it possible to graph the number of messages in Grafana, which gives me a nice overview of what's going on with notifications and the rest. I hope this will help someone, Cheers, Thomas Goirand (zigo)
participants (7)
-
Ben Nemec
-
Donny Davis
-
Gabriele Santomaggio
-
Grant Morley
-
Herve Beraud
-
Laurent Dumont
-
Thomas Goirand