[openstack-dev] [nova] How to deal the rpc timeout between compute and conductor?
Rui Chen
chenrui.momo at gmail.com
Mon Mar 23 10:00:43 UTC 2015
Hi all:
I deploy my OpenStack with VMware driver, one nova-compute connect to
VMware deployment,
there are about 3000 VMs in VMware deployment. I use mysql. The method
of InstanceList.get_by_host
rasie rpc timeout error when ComputeManager.init_host() and
_sync_power_states periodic task execute.
Currently, one nova-compute host map to the whole VMware deployment
that maybe contain several clusters
in nova VMware driver. When InstanceList.get_by_host execute in
ComputeManager, it indicate that nova-compute
will execute a rpc call to nova-conducutor, nova-conductor will fetch a
lots of instances in the whole VMware
deployment in once, in my case , it's 3000 instances. The long time SQL
query maybe lead to the rpc timeout
from nova-compute to nova-conductor. We only face the issue in VMWare
driver.
https://bugs.launchpad.net/nova/+bug/1420662
https://review.openstack.org/#/c/155676/
In the patch I split the large rpc request to multiple small rpc requests
using pagination mechanism in order to
fix this issue, but sahid think it looks like a hack and need a real
pattern to handle this problem.
If you have other better idea, please let me know.
Feel free to discuss it. Thanks.
Best Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150323/644c48a4/attachment.html>
More information about the OpenStack-dev
mailing list