[magnum][api] Error system library fopen too many open files with magnum-auto-healer
Erik Olof Gunnar Andersson
eandersson at blizzard.com
Thu Jan 7 03:12:10 UTC 2021
Glad it helped . Going to work with the magnum team to get it merged.
Would it be possible for you to document the issue and create a bug here https://storyboard.openstack.org/#!/project/openstack/magnum
________________________________
From: Ionut Biru <ionut at fleio.com>
Sent: Wednesday, January 6, 2021 3:37 AM
To: Erik Olof Gunnar Andersson <eandersson at blizzard.com>
Cc: Spyros Trigazis <strigazi at gmail.com>; feilong <feilong at catalyst.net.nz>; openstack-discuss <openstack-discuss at lists.openstack.org>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer
Hi Erik,
Thanks a lot for the patch. Indeed 769471 fixes my problem at first glance.
I'll let it run for a couple of days.
On Wed, Jan 6, 2021 at 12:23 PM Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>> wrote:
I pushed a couple of patches that you can try out.
This is the most likely culprit.
https://review.opendev.org/c/openstack/magnum/+/769471<https://urldefense.com/v3/__https://review.opendev.org/c/openstack/magnum/*/769471__;Kw!!Ci6f514n9QsL8ck!yNYepzwGOz5tzeQ62h1r5z7iHBcYFnMmO9kzEmWdJqo-BK9PgMQWoB-IT5ji6cXKKQ$> - Re-use rpc client
I also created this one, but doubt this is an issue as the implementation here is the same as I use in Designate
https://review.opendev.org/c/openstack/magnum/+/769457<https://urldefense.com/v3/__https://review.opendev.org/c/openstack/magnum/*/769457__;Kw!!Ci6f514n9QsL8ck!yNYepzwGOz5tzeQ62h1r5z7iHBcYFnMmO9kzEmWdJqo-BK9PgMQWoB-IT5i2M52Ovw$> - [WIP] Singleton notifier
Finally I also created a PR to add magnum-api testing using uwsgi.
https://review.opendev.org/c/openstack/magnum/+/769450<https://urldefense.com/v3/__https://review.opendev.org/c/openstack/magnum/*/769450__;Kw!!Ci6f514n9QsL8ck!yNYepzwGOz5tzeQ62h1r5z7iHBcYFnMmO9kzEmWdJqo-BK9PgMQWoB-IT5hi_0tIMw$>
Let me know if any of these patches help!
________________________________
From: Ionut Biru <ionut at fleio.com<mailto:ionut at fleio.com>>
Sent: Tuesday, January 5, 2021 8:36 AM
To: Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>>
Cc: Spyros Trigazis <strigazi at gmail.com<mailto:strigazi at gmail.com>>; feilong <feilong at catalyst.net.nz<mailto:feilong at catalyst.net.nz>>; openstack-discuss <openstack-discuss at lists.openstack.org<mailto:openstack-discuss at lists.openstack.org>>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer
Hi,
I found this story: https://storyboard.openstack.org/#!/story/2008308<https://urldefense.com/v3/__https://storyboard.openstack.org/*!/story/2008308__;Iw!!Ci6f514n9QsL8ck!1f5rl4Hhpd13WKbYo8oADBrjfiG2BvU4omHN8zT_EtCcWSC4JoI9JJkg_A3rIZPB1g$> regarding disabling cluster update notifications in rabbitmq.
I think this will help me.
On Tue, Jan 5, 2021 at 12:21 PM Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>> wrote:
Sorry, being repetitive here, but maybe try adding this to your magnum config as well. If you have A LOT of cores it could add up to a crazy amount of connections.
[conductor]
workers = 2
________________________________
From: Ionut Biru <ionut at fleio.com<mailto:ionut at fleio.com>>
Sent: Tuesday, January 5, 2021 1:50 AM
To: Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>>
Cc: Spyros Trigazis <strigazi at gmail.com<mailto:strigazi at gmail.com>>; feilong <feilong at catalyst.net.nz<mailto:feilong at catalyst.net.nz>>; openstack-discuss <openstack-discuss at lists.openstack.org<mailto:openstack-discuss at lists.openstack.org>>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer
Hi,
Here is my config. maybe something is fishy.
I did have around 300 messages in the queue in notification.info<https://urldefense.com/v3/__http://notification.info__;!!Ci6f514n9QsL8ck!zXau4TQ7lpYxxCmShvD-QtwfISyXyajq11TeBMle6hAdw3N9NdP7PuG5YgqgOhdO4A$> and notification.err and I purged them.
https://paste.xinu.at/woMt/<https://urldefense.com/v3/__https://paste.xinu.at/woMt/__;!!Ci6f514n9QsL8ck!zXau4TQ7lpYxxCmShvD-QtwfISyXyajq11TeBMle6hAdw3N9NdP7PuG5YgrG1_F7_w$>
On Tue, Jan 5, 2021 at 11:23 AM Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>> wrote:
Yea - tested locally as well and wasn't able to reproduce it either. I changed the health service job to run every second and maxed out at about 42 connections to RabbitMQ with two conductor workers.
/etc/magnum/magnun.conf
[conductor]
workers = 2
________________________________
From: Spyros Trigazis <strigazi at gmail.com<mailto:strigazi at gmail.com>>
Sent: Tuesday, January 5, 2021 12:59 AM
To: Ionut Biru <ionut at fleio.com<mailto:ionut at fleio.com>>
Cc: Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>>; feilong <feilong at catalyst.net.nz<mailto:feilong at catalyst.net.nz>>; openstack-discuss <openstack-discuss at lists.openstack.org<mailto:openstack-discuss at lists.openstack.org>>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer
On Tue, Jan 5, 2021 at 9:36 AM Ionut Biru <ionut at fleio.com<mailto:ionut at fleio.com>> wrote:
Hi,
I tried with process=1 and it reached 1016 connections to rabbitmq.
lsof
https://paste.xinu.at/jGg/<https://urldefense.com/v3/__https://paste.xinu.at/jGg/__;!!Ci6f514n9QsL8ck!w-sy8zu-TkPMcmlD3ZhyxEiBTRWikibrBZOfumXkqKodtdcI4FD236uNMmjynMvIcA$>
i think it goes into error when it reaches 1024 file descriptors.
I'm out of ideas of how to resolve this. I only have 3 clusters available and it's kinda weird and It doesn't scale.
No issues here with 100s of clusters. Not sure what doesn't scale.
* Maybe your rabbit is flooded with notifications that are not consumed?
* You can use way more than 1024 file descriptors, maybe 2^10?
Spyros
On Mon, Jan 4, 2021 at 9:53 PM Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>> wrote:
Sure looks like RabbitMQ. How many workers do have you configured?
Could you try to changing the uwsgi configuration to workers=1 (or processes=1) and then see if it goes beyond 30 connections to amqp.
From: Ionut Biru <ionut at fleio.com<mailto:ionut at fleio.com>>
Sent: Monday, January 4, 2021 4:07 AM
To: Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>>
Cc: feilong <feilong at catalyst.net.nz<mailto:feilong at catalyst.net.nz>>; openstack-discuss <openstack-discuss at lists.openstack.org<mailto:openstack-discuss at lists.openstack.org>>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer
Hi Erik,
Here is lsof of one uwsgi api. https://paste.xinu.at/5YUWf/<https://urldefense.com/v3/__https:/paste.xinu.at/5YUWf/__;!!Ci6f514n9QsL8ck!wv_wzG-Ntk0gd3ReOupQl-iXIcWpPR3genCqeKNY5JCKZDWxQHSqqa-uxxgUFFhz0Q$>
I have kubernetes 12.0.1 installed in env.
On Sun, Jan 3, 2021 at 3:06 AM Erik Olof Gunnar Andersson <eandersson at blizzard.com<mailto:eandersson at blizzard.com>> wrote:
Maybe something similar to this?
https://github.com/kubernetes-client/python/issues/1158<https://urldefense.com/v3/__https:/github.com/kubernetes-client/python/issues/1158__;!!Ci6f514n9QsL8ck!wv_wzG-Ntk0gd3ReOupQl-iXIcWpPR3genCqeKNY5JCKZDWxQHSqqa-uxxgAtzJkNg$>
What does lsof say?
--
Ionut Biru - https://fleio.com<https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!w-sy8zu-TkPMcmlD3ZhyxEiBTRWikibrBZOfumXkqKodtdcI4FD236uNMmit-G0eng$>
--
Ionut Biru - https://fleio.com<https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!zXau4TQ7lpYxxCmShvD-QtwfISyXyajq11TeBMle6hAdw3N9NdP7PuG5Ygp-5WUmyw$>
--
Ionut Biru - https://fleio.com<https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!1f5rl4Hhpd13WKbYo8oADBrjfiG2BvU4omHN8zT_EtCcWSC4JoI9JJkg_A1pHO7VEQ$>
--
Ionut Biru - https://fleio.com<https://urldefense.com/v3/__https://fleio.com__;!!Ci6f514n9QsL8ck!yNYepzwGOz5tzeQ62h1r5z7iHBcYFnMmO9kzEmWdJqo-BK9PgMQWoB-IT5gdteMBNQ$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210107/a26a557d/attachment-0001.html>
More information about the openstack-discuss
mailing list