<div dir="ltr">Hi Erik,<div><br></div><div>Here is lsof of one uwsgi api. <a href="https://paste.xinu.at/5YUWf/">https://paste.xinu.at/5YUWf/</a></div><div><br></div><div>I have kubernetes 12.0.1 installed in env.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jan 3, 2021 at 3:06 AM Erik Olof Gunnar Andersson <<a href="mailto:eandersson@blizzard.com">eandersson@blizzard.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">




<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Maybe something similar to this?<br>
<a href="https://github.com/kubernetes-client/python/issues/1158" id="gmail-m_2047728040938236488LPlnk" target="_blank">https://github.com/kubernetes-client/python/issues/1158</a><br>
<br>
What does lsof say?<br>
<div></div>
<br>
</div>
<div id="gmail-m_2047728040938236488appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_2047728040938236488divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Erik Olof Gunnar Andersson <<a href="mailto:eandersson@blizzard.com" target="_blank">eandersson@blizzard.com</a>><br>
<b>Sent:</b> Saturday, January 2, 2021 4:54 PM<br>
<b>To:</b> Ionut Biru <<a href="mailto:ionut@fleio.com" target="_blank">ionut@fleio.com</a>>; feilong <<a href="mailto:feilong@catalyst.net.nz" target="_blank">feilong@catalyst.net.nz</a>><br>
<b>Cc:</b> openstack-discuss <<a href="mailto:openstack-discuss@lists.openstack.org" target="_blank">openstack-discuss@lists.openstack.org</a>><br>
<b>Subject:</b> Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer</font>
<div> </div>
</div>

<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Are you sure you aren't just looking at the connection pool expanding? Each worker has a max number of connections it can use. Maybe look at lowering <span style="color:rgb(3,47,98);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;background-color:rgb(255,255,255);display:inline">rpc_conn_pool_size. <span style="color:rgb(0,0,0);font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16px;background-color:rgb(255,255,255);display:inline">By
 default I believe each worker might create a pool of up to 30 connections.</span></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="color:rgb(3,47,98);font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,monospace;font-size:12px;background-color:rgb(255,255,255);display:inline"><span style="color:rgb(0,0,0);font-family:Calibri,Arial,Helvetica,sans-serif;font-size:16px"><br>
Looking at the code it could also be have something to do with the k8s client. Since it creates a new instance each time it does an health check. What version of the k8s client do you have installed?</span></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div>
</div>
<br>
<br>
</div>
<div id="gmail-m_2047728040938236488x_appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_2047728040938236488x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Ionut Biru <<a href="mailto:ionut@fleio.com" target="_blank">ionut@fleio.com</a>><br>
<b>Sent:</b> Tuesday, December 29, 2020 2:20 PM<br>
<b>To:</b> feilong <<a href="mailto:feilong@catalyst.net.nz" target="_blank">feilong@catalyst.net.nz</a>><br>
<b>Cc:</b> openstack-discuss <<a href="mailto:openstack-discuss@lists.openstack.org" target="_blank">openstack-discuss@lists.openstack.org</a>><br>
<b>Subject:</b> Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi,
<div><br>
</div>
<div>Not sure if my suspicion is true but I think for each update a new notifier is prepared and used without closing the connection but my understanding of oslo is nonexistent. </div>
<div><br>
</div>
<div><a href="https://urldefense.com/v3/__https://opendev.org/openstack/magnum/src/branch/master/magnum/conductor/utils.py*L147__;Iw!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKwDSl9vw$" target="_blank">https://opendev.org/openstack/magnum/src/branch/master/magnum/conductor/utils.py#L147</a><br>
</div>
<div><a href="https://urldefense.com/v3/__https://opendev.org/openstack/magnum/src/branch/master/magnum/common/rpc.py*L173__;Iw!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnJa8-cGbA$" target="_blank">https://opendev.org/openstack/magnum/src/branch/master/magnum/common/rpc.py#L173</a><br>
</div>
</div>
<br>
<div>
<div dir="ltr">On Tue, Dec 29, 2020 at 11:52 PM Ionut Biru <<a href="mailto:ionut@fleio.com" target="_blank">ionut@fleio.com</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Hi Feilong,
<div><br>
</div>
<div>I found out that each time the update_health_status periodic task is run, a new connection(for each uwsgi) is made to rabbitmq.</div>
<div><br>
</div>
<div>root@ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l<br>
229<br>
root@ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l<br>
234<br>
</div>
<div>root@ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l<br>
238<br>
</div>
<div>root@ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l<br>
241<br>
root@ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l<br>
244<br>
</div>
<div><br>
</div>
<div>Not sure </div>
<div><br>
</div>
<div>Dec 29 21:51:22 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:22.024 262800 DEBUG magnum.service.periodic [req-3b495326-cf80-481e-b3c6-c741f05b7f0e - - - - -]
<br>
Dec 29 21:51:22 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:22.024 262800 DEBUG oslo_service.periodic_task [-] Running periodic task MagnumPeriodicTasks.sync<br>
Dec 29 21:51:16 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262804]: 2020-12-29 21:51:16.462 262804 DEBUG magnum.conductor.handlers.cluster_conductor [req-284ac12b-d76a-4e50-8e74-5bfb<br>
Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:15.573 262800 DEBUG magnum.service.periodic [-] Status for cluster 118 updated to HEALTHY ({'api'<br>
Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262805]: 2020-12-29 21:51:15.572 262805 DEBUG magnum.conductor.handlers.cluster_conductor [req-3fc29ee9-4051-42e7-ae19-3a49<br>
Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:15.572 262800 DEBUG magnum.service.periodic [-] Status for cluster 121 updated to HEALTHY ({'api'<br>
Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:15.572 262800 DEBUG magnum.service.periodic [-] Status for cluster 122 updated to HEALTHY ({'api'<br>
Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:15.553 262800 DEBUG magnum.service.periodic [-] Updating health status for cluster 122 update_hea<br>
Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:15.544 262800 DEBUG magnum.service.periodic [-] Updating health status for cluster 121 update_hea<br>
Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:15.535 262800 DEBUG magnum.service.periodic [-] Updating health status for cluster 118 update_hea<br>
Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a magnum-conductor[262800]: 2020-12-29 21:51:15.494 262800 DEBUG magnum.service.periodic [req-405b1fed-0b8a-4a60-b6ae-834f548b21d1 - - - <br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>2020-12-29 21:51:14.082 [info] <0.953.1293> accepting AMQP connection <0.953.1293> (<a href="https://urldefense.com/v3/__http://172.29.93.14:48474__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnJ0aOLwIQ$" target="_blank">172.29.93.14:48474</a>
 -> <a href="https://urldefense.com/v3/__http://172.29.95.38:5672__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKyfYp2-Q$" target="_blank">
172.29.95.38:5672</a>)<br>
2020-12-29 21:51:14.083 [info] <0.953.1293> Connection <0.953.1293> (<a href="https://urldefense.com/v3/__http://172.29.93.14:48474__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnJ0aOLwIQ$" target="_blank">172.29.93.14:48474</a>
 -> <a href="https://urldefense.com/v3/__http://172.29.95.38:5672__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKyfYp2-Q$" target="_blank">
172.29.95.38:5672</a>) has a client-provided name: uwsgi:262739:f86c0570-8739-4b74-8102-76b5357acd71<br>
2020-12-29 21:51:14.084 [info] <0.953.1293> connection <0.953.1293> (<a href="https://urldefense.com/v3/__http://172.29.93.14:48474__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnJ0aOLwIQ$" target="_blank">172.29.93.14:48474</a>
 -> <a href="https://urldefense.com/v3/__http://172.29.95.38:5672__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKyfYp2-Q$" target="_blank">
172.29.95.38:5672</a> - uwsgi:262739:f86c0570-8739-4b74-8102-76b5357acd71): user 'magnum' authenticated and granted access to vhost '/magnum'<br>
2020-12-29 21:51:15.560 [info] <0.1656.1283> accepting AMQP connection <0.1656.1283> (<a href="https://urldefense.com/v3/__http://172.29.93.14:48548__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnLsbQ8hVw$" target="_blank">172.29.93.14:48548</a>
 -> <a href="https://urldefense.com/v3/__http://172.29.95.38:5672__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKyfYp2-Q$" target="_blank">
172.29.95.38:5672</a>)<br>
2020-12-29 21:51:15.561 [info] <0.1656.1283> Connection <0.1656.1283> (<a href="https://urldefense.com/v3/__http://172.29.93.14:48548__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnLsbQ8hVw$" target="_blank">172.29.93.14:48548</a>
 -> <a href="https://urldefense.com/v3/__http://172.29.95.38:5672__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKyfYp2-Q$" target="_blank">
172.29.95.38:5672</a>) has a client-provided name: uwsgi:262744:2c9792ab-9198-493a-970c-f6ccfd9947d3<br>
2020-12-29 21:51:15.561 [info] <0.1656.1283> connection <0.1656.1283> (<a href="https://urldefense.com/v3/__http://172.29.93.14:48548__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnLsbQ8hVw$" target="_blank">172.29.93.14:48548</a>
 -> <a href="https://urldefense.com/v3/__http://172.29.95.38:5672__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKyfYp2-Q$" target="_blank">
172.29.95.38:5672</a> - uwsgi:262744:2c9792ab-9198-493a-970c-f6ccfd9947d3): user 'magnum' authenticated and granted access to vhost '/magnum'<br>
</div>
</div>
<br>
<div>
<div dir="ltr">On Tue, Dec 22, 2020 at 4:12 AM feilong <<a href="mailto:feilong@catalyst.net.nz" target="_blank">feilong@catalyst.net.nz</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Hi Ionut,</p>
<p>I didn't see this before on our production. Magnum auto healer just simply sends a POST request to Magnum api to update the health status. So I would suggest write a small script or even use curl to see if you can reproduce this firstly.
<br>
</p>
<p><br>
</p>
<div>On 19/12/20 2:27 am, Ionut Biru wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi again,
<div><br>
I failed to mention that is stable/victoria with couples of patches from review. Ignore the fact that in logs it  shows the 19.1.4 version in venv path.</div>
</div>
<br>
<div>
<div dir="ltr">On Fri, Dec 18, 2020 at 3:22 PM Ionut Biru <<a href="mailto:ionut@fleio.com" target="_blank">ionut@fleio.com</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">Hi guys,
<div><br>
</div>
<div>I have an issue with magnum api returning an error after a while:</div>
<div><code style="white-space:pre-wrap"><span>Server-side error: "[('system library', 'fopen', 'Too many open files'), ('BIO routines', 'BIO_new_file', 'system lib'), ('x509 certificate routines', 'X509_load_cert_crl_file', 'system lib')]"</span></code><br>
</div>
<div><br>
</div>
<div>Log file: <a href="https://urldefense.com/v3/__https://paste.xinu.at/6djE/__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnJkPZX-_Q$" target="_blank">https://paste.xinu.at/6djE/</a></div>
<div><br>
This started to appear after I enabled the template auto_healing_controller = magnum-auto-healer,  magnum_auto_healer_tag = v1.19.0.</div>
<div><br>
</div>
<div>Currently, I only have 4 clusters.</div>
<div><br>
</div>
<div>After that the API is in error state and doesn't work unless I restart it.</div>
<div><br>
</div>
<div><br>
</div>
<div>-- <br>
<div dir="ltr">
<div dir="ltr">Ionut Biru - <a href="https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKfpV6EIg$" target="_blank">
https://fleio.com</a><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">Ionut Biru - <a href="https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKfpV6EIg$" target="_blank">
https://fleio.com</a><br>
</div>
</div>
</blockquote>
<pre cols="72">-- 
Cheers & Best regards,
Feilong Wang (王飞龙)
------------------------------------------------------
Senior Cloud Software Engineer
Tel: +64-48032246
Email: <a href="mailto:flwang@catalyst.net.nz" target="_blank">flwang@catalyst.net.nz</a>
Catalyst IT Limited
Level 6, Catalyst House, 150 Willis Street, Wellington
------------------------------------------------------ </pre>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">Ionut Biru - <a href="https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKfpV6EIg$" target="_blank">
https://fleio.com</a><br>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">Ionut Biru - <a href="https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!3b_NgWO8HXsOoUOdTUZp4KEzKcx9zpWomeb2yGJ4RRqkS1QI159_zwjwVnKfpV6EIg$" target="_blank">
https://fleio.com</a><br>
</div>
</div>
</div>
</div>
</div>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Ionut Biru - <a href="https://fleio.com" target="_blank">https://fleio.com</a><br></div></div>