Thanks Ionut. If you are able could you test this patch instead. I think I better understand what the issue was now. We were not only creating a new RPC Client for each HTTP request, but also a brand-new transport for each request. https://review.opendev.org/c/openstack/magnum/+/770707 ________________________________ From: Ionut Biru <ionut@fleio.com> Sent: Tuesday, January 12, 2021 3:17 AM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com> Cc: Spyros Trigazis <strigazi@gmail.com>; feilong <feilong@catalyst.net.nz>; openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer Hi Erik, Here it is: https://paste.xinu.at/LgH8dT/<https://urldefense.com/v3/__https://paste.xinu.at/LgH8dT/__;!!Ci6f514n9QsL8ck!3xiIzEOH2LobmnEu2CJo-_pe1pReHXcpL2yaazTbfH6tKSlHF2JOL3RwsRanhwl2Xw$> On Mon, Jan 11, 2021 at 10:45 PM Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> wrote: Thanks I added it to the commit. Could you share your uwsgi config as well. Best Regards, Erik Olof Gunnar Andersson Technical Lead, Senior Cloud Engineer From: Ionut Biru <ionut@fleio.com<mailto:ionut@fleio.com>> Sent: Tuesday, January 5, 2021 1:51 AM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Cc: Spyros Trigazis <strigazi@gmail.com<mailto:strigazi@gmail.com>>; feilong <feilong@catalyst.net.nz<mailto:feilong@catalyst.net.nz>>; openstack-discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer Hi, Here is my config. maybe something is fishy. I did have around 300 messages in the queue in notification.info<https://urldefense.com/v3/__http:/notification.info__;!!Ci6f514n9QsL8ck!zXau4TQ7lpYxxCmShvD-QtwfISyXyajq11TeBMle6hAdw3N9NdP7PuG5YgqgOhdO4A$> and notification.err and I purged them. https://paste.xinu.at/woMt/<https://urldefense.com/v3/__https:/paste.xinu.at/woMt/__;!!Ci6f514n9QsL8ck!zXau4TQ7lpYxxCmShvD-QtwfISyXyajq11TeBMle6hAdw3N9NdP7PuG5YgrG1_F7_w$> On Tue, Jan 5, 2021 at 11:23 AM Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> wrote: Yea - tested locally as well and wasn't able to reproduce it either. I changed the health service job to run every second and maxed out at about 42 connections to RabbitMQ with two conductor workers. /etc/magnum/magnun.conf [conductor] workers = 2 ________________________________ From: Spyros Trigazis <strigazi@gmail.com<mailto:strigazi@gmail.com>> Sent: Tuesday, January 5, 2021 12:59 AM To: Ionut Biru <ionut@fleio.com<mailto:ionut@fleio.com>> Cc: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>>; feilong <feilong@catalyst.net.nz<mailto:feilong@catalyst.net.nz>>; openstack-discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer On Tue, Jan 5, 2021 at 9:36 AM Ionut Biru <ionut@fleio.com<mailto:ionut@fleio.com>> wrote: Hi, I tried with process=1 and it reached 1016 connections to rabbitmq. lsof https://paste.xinu.at/jGg/<https://urldefense.com/v3/__https:/paste.xinu.at/jGg/__;!!Ci6f514n9QsL8ck!w-sy8zu-TkPMcmlD3ZhyxEiBTRWikibrBZOfumXkqKodtdcI4FD236uNMmjynMvIcA$> i think it goes into error when it reaches 1024 file descriptors. I'm out of ideas of how to resolve this. I only have 3 clusters available and it's kinda weird and It doesn't scale. No issues here with 100s of clusters. Not sure what doesn't scale. * Maybe your rabbit is flooded with notifications that are not consumed? * You can use way more than 1024 file descriptors, maybe 2^10? Spyros On Mon, Jan 4, 2021 at 9:53 PM Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> wrote: Sure looks like RabbitMQ. How many workers do have you configured? Could you try to changing the uwsgi configuration to workers=1 (or processes=1) and then see if it goes beyond 30 connections to amqp. From: Ionut Biru <ionut@fleio.com<mailto:ionut@fleio.com>> Sent: Monday, January 4, 2021 4:07 AM To: Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> Cc: feilong <feilong@catalyst.net.nz<mailto:feilong@catalyst.net.nz>>; openstack-discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer Hi Erik, Here is lsof of one uwsgi api. https://paste.xinu.at/5YUWf/<https://urldefense.com/v3/__https:/paste.xinu.at/5YUWf/__;!!Ci6f514n9QsL8ck!wv_wzG-Ntk0gd3ReOupQl-iXIcWpPR3genCqeKNY5JCKZDWxQHSqqa-uxxgUFFhz0Q$> I have kubernetes 12.0.1 installed in env. On Sun, Jan 3, 2021 at 3:06 AM Erik Olof Gunnar Andersson <eandersson@blizzard.com<mailto:eandersson@blizzard.com>> wrote: Maybe something similar to this? https://github.com/kubernetes-client/python/issues/1158<https://urldefense.com/v3/__https:/github.com/kubernetes-client/python/issues/1158__;!!Ci6f514n9QsL8ck!wv_wzG-Ntk0gd3ReOupQl-iXIcWpPR3genCqeKNY5JCKZDWxQHSqqa-uxxgAtzJkNg$> What does lsof say? -- Ionut Biru - https://fleio.com<https://urldefense.com/v3/__https:/fleio.com__;!!Ci6f514n9QsL8ck!w-sy8zu-TkPMcmlD3ZhyxEiBTRWikibrBZOfumXkqKodtdcI4FD236uNMmit-G0eng$> -- Ionut Biru - https://fleio.com<https://urldefense.com/v3/__https:/fleio.com__;!!Ci6f514n9QsL8ck!zXau4TQ7lpYxxCmShvD-QtwfISyXyajq11TeBMle6hAdw3N9NdP7PuG5Ygp-5WUmyw$> -- Ionut Biru - https://fleio.com<https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!3xiIzEOH2LobmnEu2CJo-_pe1pReHXcpL2yaazTbfH6tKSlHF2JOL3RwsRZoVksdyg$>