[openstack-dev] RabbitMQ Cluster Inconsistency Issue

Palanisamy, Anand apalanisamy at paypal.com
Sat Jan 19 18:19:39 UTC 2013


We recently found the following issue for our RabbitMQ Cluster in our QA environment. Please let me know if anyone came across the same and have a fix for this.

Steps to reproduce:
1. On a running and consistent cluster, boot 8 VMs. VMs should boot and become ACTIVE.
2. Delete newly created VMs. VMs should delete successfully.
3. Reboot 1st controller.
4. On the 2nd controller, try to retrieve list of images usign "glance index" command. glance will hang.
5. In a minute, try again - command should run successfully.
6. Ensure 1st controller is still rebooting.
7. Launch 5-8 VMs on the 2nd controller. Some of VMs will fail with RPC Timeout error. This means the RabbitMQ is not functioning correctly.
8. Wait for 1st controller to boot. In a minute, ensure (using "nova-manage service list") that all nova services are available.
9. Launch 5-8 VMs on the 2nd controller. Some of VMs will fail with RPC Timeout error. This means the RabbitMQ cluster is still inconsistent.

Additional tests were performed:
- Stop rabbitmq-server using "service rabbitmq-server stop" on 1st controller. 2nd controller remains functional.
- Restart rabbitmq-server using "service rabbitmq-server restart" on both controllers after executing steps 1-8. Cluster becomes consistent after this action.

Thanks
Anand
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130119/46762ae8/attachment.html>


More information about the OpenStack-dev mailing list