<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>Fwiw, we've seen this with nova-scheduler as well. I think the default pool size is too large in general. The problem that I've seen stems from the fact that DB calls all block and you can easily get a stack of 64 workers all waiting to do DB calls. And it happens to work out such that none of the rpc pool threads return before all run their DB calls. This is compounded by the explicit yield we have for every DB call in nova. Anyway, this means that all of the workers are tied up for quite a while. Since nova casts to the scheduler, it doesn't impact the API much. But if you were waiting on an RPC response, you could be waiting a while.</div><div><br></div><div>Ironic does a lot of RPC calls. I don't think we know the exact behavior in Ironic, but I'm assuming it's something similar. If all rpc pool threads are essentially stuck until roughly the same time, you end up with API hangs. But we're also seeing periodic task run delays as well. It must be getting stuck behind a lot of the rpc worker threads such that lowering the number of threads helps considerably.</div><div><br></div><div>Given DB calls all block the process right now, there's really not much advantage to a larger pool size. 64 is too much, IMO. It would</div><div>make more sense if there was more IO that could be parallelized.</div><div><br></div><div>That didn't answer your question. I've been meaning to ask the same one since we discovered this. :)</div><div><br></div><div>- Chris</div><div><br>On Apr 22, 2014, at 3:54 PM, Devananda van der Veen <<a href="mailto:devananda.vdv@gmail.com">devananda.vdv@gmail.com</a>> wrote:<br><br></div><blockquote type="cite"><div><div dir="ltr">Hi!<div><br></div><div>When a project is using oslo.messaging, how can we change our default rpc_thread_pool_size?</div><div><br></div><div>-----------------------</div><div>Background<br><div><br></div><div>
Ironic has hit a bug where a flood of API requests can deplete the RPC worker pool on the other end and cause things to break in very bad ways. Apparently, nova-conductor hit something similar a while back too. There've been a few long discussions on IRC about it, tracked partially here:</div>
<div> <a href="https://bugs.launchpad.net/ironic/+bug/1308680">https://bugs.launchpad.net/ironic/+bug/1308680</a></div><div><br></div><div>tldr; a way we can fix this is to set the rpc_thread_pool_size very small (eg, 4) and keep our conductor.worker_pool size near its current value (eg, 64). I'd like these to be the default option values, rather than require every user to change the rpc_thread_pool_size in their local ironic.conf file.</div>
<div><br></div><div>We're also about to switch from the RPC module in oslo-incubator to using the oslo.messaging library.</div><div><br></div><div>Why are these related? Because it looks impossible for us to change the default for this option from within Ironic, because the option is registered when EventletExecutor is instantaited (rather than loaded).</div>
<div><br></div><div><a href="https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_executors/impl_eventlet.py#L76">https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_executors/impl_eventlet.py#L76</a></div>
<div><br></div><div><br></div><div>Thanks,</div><div>Devananda</div></div></div>
</div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>OpenStack-dev mailing list</span><br><span><a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a></span><br><span><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a></span><br></div></blockquote></body></html>