<html><head></head><body>Hi,<br><br>I am not an db expert, but openstack tends to open a LOT of connection to the db. So one of the thing to monitor is the number of connection you have/allow on the db side.<br><br>Also, raising the number of RPC (and report states) workers will solve your issue.<br>The good number is not easy to calculate, and depends on each deployment.<br><br>A good approach is to the try/improve loop.<br><br>Cheers,<br>Arnaud.<br><br><br><div class="gmail_quote">Le 25 janvier 2022 09:52:26 GMT+01:00, "Md. Hejbul Tawhid MUNNA" <munnaeebd@gmail.com> a écrit :<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

<div dir="ltr">Hello 


Arnaud, <div><br></div><div>Thank you for your valuable reply. </div><div><br></div><div>we did not modify default config of RPC worker . </div><div><br></div><div>/etc/neutron/neutron.conf<br></div><div><br></div><div># Number of separate API worker processes for service. If not specified, the<br># default is equal to the number of CPUs available for best performance.<br># (integer value)<br>#api_workers = <None><br><br># Number of RPC worker processes for service. (integer value)<br>#rpc_workers = 1<br><br># Number of RPC worker processes dedicated to state reports queue. (integer<br># value)<br>#rpc_state_report_workers = 1<br><br>how to check load on database. RAM/CPU/Disk-IO utilization is low on the database server. </div><div><br></div><div>Please guide us further</div><div><br></div><div>Regards,</div><div>Munna</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 24, 2022 at 6:29 PM Arnaud <<a href="mailto:arnaud.morin@gmail.com">arnaud.morin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hi,<br><br>I would also consider checking the number of RPC workers you have in neutron.conf, this is maybe a better option to increase this before the comnection pool params.<br><br>Also, check your database, is it under load?<br>Updating agent state should not be long.<br><br>Cheers,<br>Arnaud<br><br><br><br><div class="gmail_quote">Le 24 janvier 2022 10:42:00 GMT+01:00, "Md. Hejbul Tawhid MUNNA" <<a href="mailto:munnaeebd@gmail.com" target="_blank">munnaeebd@gmail.com</a>> a écrit :<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div dir="ltr">Hi,<div><br></div><div>Currently we have running 500+VM and total network is 383 including HA-network. </div><div><br></div><div>Can you advice the appropriate value and is there any chance of service impact? </div><div><br></div><div>Should we change the configuration in the neutron.conf on controller node?</div><div><br></div><div>Regards,</div><div>Munna</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 24, 2022 at 2:47 PM Slawek Kaplonski <<a href="mailto:skaplons@redhat.com" target="_blank">skaplons@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

On poniedziałek, 24 stycznia 2022 09:09:10 CET Md. Hejbul Tawhid MUNNA wrote:<br>

> Hi,<br>

> <br>

> Suddenly we have observed few VM down . then we have found some agent are<br>

> getting down (XXX) , agents are getting UP and down randomly. Please check<br>

> the attachment.<br>

> <br>

> <br>

/////////////////////////////////////////////////////////////////////////////<br>

> /////////////////////// /sqlalchemy/pool.py", line 788, in _checkout\n   <br>

> fairy =<br>

> _ConnectionRecord.checkout(pool)\n', u'  File<br>

> "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 532, in<br>

> checkout\n    rec = pool._do_get()\n', u'  File<br>

> "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 1186, in<br>

> _do_get\n    (self.size(), self.overflow(), self._timeout),<br>

> code="3o7r")\n', u'TimeoutError: QueuePool limit of size 5 overflow 50<br>

> reached, connection timed out, timeout 30 (Background on this error at:<br>

> <a href="http://sqlalche.me/e/3o7r" rel="noreferrer" target="_blank">http://sqlalche.me/e/3o7r</a>)\n'].<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent Traceback (most<br>

> recent call last):<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent   File<br>

> "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 837, in<br>

> _report_state<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent     True)<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent   File<br>

> "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 97, in<br>

> report_state<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent     return<br>

> method(context, 'report_state', **kwargs)<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent   File<br>

> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179,<br>

> in call<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent<br>

> retry=self.retry)<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent   File<br>

> "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133,<br>

> in _send<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent     retry=retry)<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent   File<br>

> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py",<br>

> line 645, in send<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent<br>

> call_monitor_timeout, retry=retry)<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent   File<br>

> "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py",<br>

> line 636, in _send<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent     raise result<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent RemoteError:<br>

> Remote error: TimeoutError QueuePool limit of size 5 overflow 50 reached,<br>

> connection timed out, timeout 30 (Background on this error at:<br>

> <a href="http://sqlalche.me/e/3o7r" rel="noreferrer" target="_blank">http://sqlalche.me/e/3o7r</a>)<br>

> 2022-01-24 01:05:39.592 302841 ERROR neutron.agent.l3.agent [u'Traceback<br>

> (most recent call last):\n', u'  File<br>

> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163,<br>

> in _process_incoming\n    res = self.dispatcher.dispatch(message)\n', u'<br>

>  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",<br>

> line 265, in dispatch\n    return self._do_dispatch(endpoint, method, ctxt,<br>

> args)\n', u'  File<br>

> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line<br>

> 194, in _do_dispatch\n<br>

> <br>

> <br>

/////////////////////////////////////////////////////////////////////////////<br>

> ////<br>

> <br>

> Is there anything related with the following default configuration.<br>

> <br>

> /etc/neutron/neutron.conf<br>

> #max_pool_size = 5<br>

> #max_overflow = 50<br>

<br>

Yes. You probably have busy environment and You need to increase those values <br>

to have more connections from the neutron server to the database.<br>

<br>

> <br>

> regards,<br>

> Munna<br>

<br>

<br>

<br>

-- <br>

Slawek Kaplonski<br>

Principal Software Engineer<br>

Red Hat</blockquote></div>

</blockquote></div></div></blockquote></div>

</blockquote></div></body></html>