[magnum][api] Error system library fopen too many open files with magnum-auto-healer

Ionut Biru ionut at fleio.com
Tue Dec 29 22:20:25 UTC 2020


Hi,

Not sure if my suspicion is true but I think for each update a new notifier
is prepared and used without closing the connection but my understanding of
oslo is nonexistent.

https://opendev.org/openstack/magnum/src/branch/master/magnum/conductor/utils.py#L147
https://opendev.org/openstack/magnum/src/branch/master/magnum/common/rpc.py#L173

On Tue, Dec 29, 2020 at 11:52 PM Ionut Biru <ionut at fleio.com> wrote:

> Hi Feilong,
>
> I found out that each time the update_health_status periodic task is run,
> a new connection(for each uwsgi) is made to rabbitmq.
>
> root at ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l
> 229
> root at ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l
> 234
> root at ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l
> 238
> root at ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l
> 241
> root at ctrl1cj-magnum-container-7a7a412a:~# netstat -npt | grep 5672 | wc -l
> 244
>
> Not sure
>
> Dec 29 21:51:22 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:22.024 262800 DEBUG
> magnum.service.periodic [req-3b495326-cf80-481e-b3c6-c741f05b7f0e - - - -
> -]
> Dec 29 21:51:22 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:22.024 262800 DEBUG
> oslo_service.periodic_task [-] Running periodic task
> MagnumPeriodicTasks.sync
> Dec 29 21:51:16 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262804]: 2020-12-29 21:51:16.462 262804 DEBUG
> magnum.conductor.handlers.cluster_conductor
> [req-284ac12b-d76a-4e50-8e74-5bfb
> Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:15.573 262800 DEBUG
> magnum.service.periodic [-] Status for cluster 118 updated to HEALTHY
> ({'api'
> Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262805]: 2020-12-29 21:51:15.572 262805 DEBUG
> magnum.conductor.handlers.cluster_conductor
> [req-3fc29ee9-4051-42e7-ae19-3a49
> Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:15.572 262800 DEBUG
> magnum.service.periodic [-] Status for cluster 121 updated to HEALTHY
> ({'api'
> Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:15.572 262800 DEBUG
> magnum.service.periodic [-] Status for cluster 122 updated to HEALTHY
> ({'api'
> Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:15.553 262800 DEBUG
> magnum.service.periodic [-] Updating health status for cluster 122
> update_hea
> Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:15.544 262800 DEBUG
> magnum.service.periodic [-] Updating health status for cluster 121
> update_hea
> Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:15.535 262800 DEBUG
> magnum.service.periodic [-] Updating health status for cluster 118
> update_hea
> Dec 29 21:51:15 ctrl1cj-magnum-container-7a7a412a
> magnum-conductor[262800]: 2020-12-29 21:51:15.494 262800 DEBUG
> magnum.service.periodic [req-405b1fed-0b8a-4a60-b6ae-834f548b21d1 - - -
>
>
> 2020-12-29 21:51:14.082 [info] <0.953.1293> accepting AMQP connection
> <0.953.1293> (172.29.93.14:48474 -> 172.29.95.38:5672)
> 2020-12-29 21:51:14.083 [info] <0.953.1293> Connection <0.953.1293> (
> 172.29.93.14:48474 -> 172.29.95.38:5672) has a client-provided name:
> uwsgi:262739:f86c0570-8739-4b74-8102-76b5357acd71
> 2020-12-29 21:51:14.084 [info] <0.953.1293> connection <0.953.1293> (
> 172.29.93.14:48474 -> 172.29.95.38:5672 -
> uwsgi:262739:f86c0570-8739-4b74-8102-76b5357acd71): user 'magnum'
> authenticated and granted access to vhost '/magnum'
> 2020-12-29 21:51:15.560 [info] <0.1656.1283> accepting AMQP connection
> <0.1656.1283> (172.29.93.14:48548 -> 172.29.95.38:5672)
> 2020-12-29 21:51:15.561 [info] <0.1656.1283> Connection <0.1656.1283> (
> 172.29.93.14:48548 -> 172.29.95.38:5672) has a client-provided name:
> uwsgi:262744:2c9792ab-9198-493a-970c-f6ccfd9947d3
> 2020-12-29 21:51:15.561 [info] <0.1656.1283> connection <0.1656.1283> (
> 172.29.93.14:48548 -> 172.29.95.38:5672 -
> uwsgi:262744:2c9792ab-9198-493a-970c-f6ccfd9947d3): user 'magnum'
> authenticated and granted access to vhost '/magnum'
>
> On Tue, Dec 22, 2020 at 4:12 AM feilong <feilong at catalyst.net.nz> wrote:
>
>> Hi Ionut,
>>
>> I didn't see this before on our production. Magnum auto healer just
>> simply sends a POST request to Magnum api to update the health status. So I
>> would suggest write a small script or even use curl to see if you can
>> reproduce this firstly.
>>
>>
>> On 19/12/20 2:27 am, Ionut Biru wrote:
>>
>> Hi again,
>>
>> I failed to mention that is stable/victoria with couples of patches from
>> review. Ignore the fact that in logs it  shows the 19.1.4 version in venv
>> path.
>>
>> On Fri, Dec 18, 2020 at 3:22 PM Ionut Biru <ionut at fleio.com> wrote:
>>
>>> Hi guys,
>>>
>>> I have an issue with magnum api returning an error after a while:
>>> Server-side error: "[('system library', 'fopen', 'Too many open files'),
>>> ('BIO routines', 'BIO_new_file', 'system lib'), ('x509 certificate
>>> routines', 'X509_load_cert_crl_file', 'system lib')]"
>>>
>>> Log file: https://paste.xinu.at/6djE/
>>>
>>> This started to appear after I enabled the
>>> template auto_healing_controller = magnum-auto-healer,
>>> magnum_auto_healer_tag = v1.19.0.
>>>
>>> Currently, I only have 4 clusters.
>>>
>>> After that the API is in error state and doesn't work unless I restart
>>> it.
>>>
>>>
>>> --
>>> Ionut Biru - https://fleio.com
>>>
>>
>>
>> --
>> Ionut Biru - https://fleio.com
>>
>> --
>> Cheers & Best regards,
>> Feilong Wang (王飞龙)
>> ------------------------------------------------------
>> Senior Cloud Software Engineer
>> Tel: +64-48032246
>> Email: flwang at catalyst.net.nz
>> Catalyst IT Limited
>> Level 6, Catalyst House, 150 Willis Street, Wellington
>> ------------------------------------------------------
>>
>>
>
> --
> Ionut Biru - https://fleio.com
>


-- 
Ionut Biru - https://fleio.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20201230/633d040c/attachment.html>


More information about the openstack-discuss mailing list