Thanks Ionut.

If you are able could you test this patch instead. I think I better understand what the issue was now. We were not only creating a new RPC Client for each HTTP request, but also a brand-new transport for each request.
https://review.opendev.org/c/openstack/magnum/+/770707


From: Ionut Biru <ionut@fleio.com>
Sent: Tuesday, January 12, 2021 3:17 AM
To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>
Cc: Spyros Trigazis <strigazi@gmail.com>; feilong <feilong@catalyst.net.nz>; openstack-discuss <openstack-discuss@lists.openstack.org>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer
 
Hi Erik,

Here it is: https://paste.xinu.at/LgH8dT/

On Mon, Jan 11, 2021 at 10:45 PM Erik Olof Gunnar Andersson <eandersson@blizzard.com> wrote:

Thanks I added it to the commit.

 

Could you share your uwsgi config as well.

 

Best Regards, Erik Olof Gunnar Andersson

Technical Lead, Senior Cloud Engineer

 

From: Ionut Biru <ionut@fleio.com>
Sent: Tuesday, January 5, 2021 1:51 AM
To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>
Cc: Spyros Trigazis <strigazi@gmail.com>; feilong <feilong@catalyst.net.nz>; openstack-discuss <openstack-discuss@lists.openstack.org>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer

 

Hi,

 

Here is my config. maybe something is fishy.

 

I did have around 300 messages in the queue in notification.info and notification.err and I purged them.

 

 

 

 

On Tue, Jan 5, 2021 at 11:23 AM Erik Olof Gunnar Andersson <eandersson@blizzard.com> wrote:

Yea - tested locally as well and wasn't able to reproduce it either. I changed the health service job to run every second and maxed out at about 42 connections to RabbitMQ with two conductor workers.

/etc/magnum/magnun.conf

[conductor]

workers = 2

 


From: Spyros Trigazis <strigazi@gmail.com>
Sent: Tuesday, January 5, 2021 12:59 AM
To: Ionut Biru <ionut@fleio.com>
Cc: Erik Olof Gunnar Andersson <eandersson@blizzard.com>; feilong <feilong@catalyst.net.nz>; openstack-discuss <openstack-discuss@lists.openstack.org>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer

 

 

 

On Tue, Jan 5, 2021 at 9:36 AM Ionut Biru <ionut@fleio.com> wrote:

Hi,


I tried with process=1 and it reached 1016 connections to rabbitmq.

lsof

 

i think it goes into error when it reaches 1024 file descriptors.

 

I'm out of ideas of how to resolve this. I only have 3 clusters available and it's kinda weird and It doesn't scale.

 

No issues here with 100s of clusters. Not sure what doesn't scale.

 

* Maybe your rabbit is flooded with notifications that are not consumed? 

* You can use way more than 1024 file descriptors, maybe 2^10?

 

Spyros

 

On Mon, Jan 4, 2021 at 9:53 PM Erik Olof Gunnar Andersson <eandersson@blizzard.com> wrote:

Sure looks like RabbitMQ. How many workers do have you configured?

 

Could you try to changing the uwsgi configuration to workers=1 (or processes=1) and then see if it goes beyond 30 connections to amqp.

 

From: Ionut Biru <ionut@fleio.com>
Sent: Monday, January 4, 2021 4:07 AM
To: Erik Olof Gunnar Andersson <eandersson@blizzard.com>
Cc: feilong <feilong@catalyst.net.nz>; openstack-discuss <openstack-discuss@lists.openstack.org>
Subject: Re: [magnum][api] Error system library fopen too many open files with magnum-auto-healer

 

Hi Erik,

 

Here is lsof of one uwsgi api. https://paste.xinu.at/5YUWf/

 

I have kubernetes 12.0.1 installed in env.

 

 

On Sun, Jan 3, 2021 at 3:06 AM Erik Olof Gunnar Andersson <eandersson@blizzard.com> wrote:

Maybe something similar to this?
https://github.com/kubernetes-client/python/issues/1158

What does lsof say?

 

 


 

--

Ionut Biru - https://fleio.com


 

--

Ionut Biru - https://fleio.com



--
Ionut Biru - https://fleio.com