[openstack-dev] [nova] How to deal the rpc timeout between compute and conductor?

Rui Chen chenrui.momo at gmail.com
Mon Mar 23 10:00:43 UTC 2015


Hi all:

    I deploy my OpenStack with VMware driver, one nova-compute connect to
VMware deployment,
    there are about 3000 VMs in VMware deployment. I use mysql. The method
of InstanceList.get_by_host
    rasie rpc timeout error when ComputeManager.init_host() and
_sync_power_states periodic task execute.
    Currently, one nova-compute host map to the whole VMware deployment
that maybe contain several clusters
    in nova VMware driver. When InstanceList.get_by_host execute in
ComputeManager, it indicate that nova-compute
    will execute a rpc call to nova-conducutor, nova-conductor will fetch a
lots of instances in the whole VMware
    deployment in once, in my case , it's 3000 instances. The long time SQL
query maybe lead to the rpc timeout
    from nova-compute to nova-conductor. We only face the issue in VMWare
driver.

https://bugs.launchpad.net/nova/+bug/1420662
https://review.openstack.org/#/c/155676/

In the patch I split the large rpc request to multiple small rpc requests
using pagination mechanism in order to
fix this issue, but sahid think it looks like a hack and need a real
pattern to handle this problem.

If you have other better idea, please let me know.
Feel free to discuss it. Thanks.

Best Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150323/644c48a4/attachment.html>


More information about the OpenStack-dev mailing list