I increased the SessionsMax to 16384 on one of the nodes, and again, rabbitmq uses almost all available sessions: control03:~ # loginctl list-sessions | grep -c rabbit 16325 But everything seems to be working okay, it's just filling up the logs apparently. And it seems as if all new sessions are closed properly: control03:~ # journalctl --since 2024-08-14 | grep -c "session opened for user rabbitmq" 7679 control03:~ # journalctl --since 2024-08-14 | grep -c "session closed for user rabbitmq" 7679 So I guess I'll just leave it as it is and won't worry about those systemd messages. What I'm wondering about is why only two out of three control nodes reach the SessionsMax limit while the third (which joined the cluster later) only has 2 rabbitmq sessions. I seem to overlook something, but I don't know what it is yet. And I'm curious if this is working "as designed". This is a cluster with 3 control nodes and 36 compute nodes. What do other operators see in their HA clouds regarding rabbitmq? Thanks! Eugen Zitat von Eugen Block <eblock@nde.ag>:
On a test cluster I tried to kill one of the sessions in "closing" status, that was immediately noticed by pacemaker which spawned a new process. So the closing status or the session timestamps are no indication of unused sessions. Still not sure if enable-linger or simply increasing the systemd-logind value for "SessionsMax" is the better solution.
Zitat von Eugen Block <eblock@nde.ag>:
Hi *,
I can't seem to find much about rabbitmq and logind, so I wanted to ask the list if anyone has encountered the same and if so, how they dealt with it. We're supporting a Victoria cluster (installed with our own deployment method) mostly controlled by pacemaker. And on the rabbit master node I see this warning constantly:
---snip--- 2024-07-29T14:09:23.552576+02:00 control01 su: pam_unix(su:session): session opened for user rabbitmq by (uid=0) 2024-07-29T14:09:24.450657+02:00 control01 su: pam_unix(su:session): session closed for user rabbitmq 2024-07-29T14:09:24.500356+02:00 control01 su: (to rabbitmq) root on none 2024-07-29T14:09:24.502370+02:00 control01 su: pam_systemd(su:session): Failed to create session: Maximum number of sessions (8192) reached, refusing further sessions. 2024-07-29T14:09:24.502681+02:00 control01 su: pam_unix(su:session): session opened for user rabbitmq by (uid=0) 2024-07-29T14:09:25.565203+02:00 control01 su: pam_unix(su:session): session closed for user rabbitmq 2024-07-29T14:09:25.609613+02:00 control01 su: (to rabbitmq) root on none ---snip---
Looking into loginctl list-sessions, almost all of them belong to rabbitmq and they have a very old timestamp (2023). I'm aware of older systemd versions which can't handle closing sessions correctly [0], but we can't upgrade at this time. Would enabling "linger" fix this (loginctl enable-linger rabbitmq) after a reboot? During the next maintenance window I would reboot control01 and watch how the other control nodes behave wrt rabbit login-sessions. But I'm wondering if somebody already has dealt with this?
Thanks for any pointers! Eugen