On 10/1/24 23:31, Arnaud Morin wrote:
Hey,
I totally agree about the fact that heartbeat_in_pthread and the oslo.log PipeMutex are technical debt that we need to get rid of, as well as eventlet.
However, despite the fact that it seems purely cosmetic on your side, we believe it's not. I can't prove / reproduce the issue on a small infra, but definetely, at large scale, having those tcp connections to be dropped by rabbitmq and recreated in a loop by agents is affecting the cluster.
I know all the pain that these settings introduced in the past, but now I feel we are in a stable situation regarding this, that's why I am surprised about deprecating heartbeat_in_pthread now.
Can we, as least, make sure we keep all of this until we switch off eventlet? In other words, can we get rid of eventlet, then remove this params? and not the opposite?
That's the plan. We deprecated the parameter because it is no longer useful *ONCE* we get rid of eventlet completely. The parameter will be removed ONLY AFTER the eventlet removal is down.
Regards,
Arnaud
On 01.10.24 - 11:38, smooney@redhat.com wrote:
im glad you managed to make it work but form a nova perspective we do not recommend using heartbeat_in_pthread=true with nova-compute to the point that i woudl cosndier that config unsupported.
we also dont recommend using it with nova-api even when running via a wsgi server such as mod_wsgi or uwsgi.
the only thing this has ever done is remove a cosmetic waring in the rabbit/nova logs due to the heartbeat timing out. This has never fix any functional bug that we were aware of but has resulted in several real bugs.
the most recent we hit was https://launchpad.net/bugs/1983863 which was mitigated by https://review.opendev.org/c/openstack/oslo.log/+/852443 however that uses a unsafe debug option in eventlet eventlet.debug.hub_prevent_multiple_readers(False)
while you may be able to make heartbeat_in_pthread work with a lot of work as Takashi noted this will eventually go away when we remove evently and to enable that removal we need to replace the PipeMutex that currently fixes logging in a native thread so heartbeat_in_pthread is part of the technial debt we need to remvoe to evenrally allow us to move away form eventlet entirly.
On Tue, 2024-10-01 at 09:13 +0000, Arnaud Morin wrote:
Yes, I agree that it used to be broken, but since the bug was reported, we merged the following fixes:
https://review.opendev.org/c/openstack/oslo.messaging/+/894731 https://review.opendev.org/c/openstack/oslo.messaging/+/875615 https://review.opendev.org/c/openstack/oslo.messaging/+/876318
That's why I believe everything should be fine now :)
On 01.10.24 - 17:20, Takashi Kajinami wrote:
I was too fast to push Send button.
It's still interesting to see that you enabled the feature for eventlet services, such as nova-compute. In the past we got a few bugs caused by that feature, which made us eventually revert the default value to False. https://bugs.launchpad.net/oslo.messaging/+bug/1934937 https://bugs.launchpad.net/oslo.messaging/+bug/1949964 https://bugs.launchpad.net/oslo.messaging/+bug/1949964
You might need to check if the reported problem is reproduced in your env.
On 10/1/24 17:15, Takashi Kajinami wrote:
Setting heartbeat_in_pthread is known to break services using eventlet so it SHOULD NOT be enabled by default. We tried to enable it by default in the past but eventually reverted it after seeing multiple problems.
You can selectively disable it for services not using eventlet (api services run by http + mod_wsgi or uwsgi) but should keep it False for the other services.
Once we get rid of eventlet then we no longer use eventlet thread for heartbeat so we no longer need that option (because the behavior would be equivalent to one with heartbeat_in_pthread=True). But until that point we can't change the default, unless someone is willing to dig into the past problems to make the feature completely work with eventlet (which I don't think worth paying effort for at this stage).
On 10/1/24 16:34, Arnaud Morin wrote:
Hello,
I completely miss the deprecation of heartbeat_in_pthread in oslo.messaging [1].
We heavily rely on this parameter downstream and our opinion is that it should be set to True by default. We use it for both wsgi services and agents (nova-compute, neutron agents, etc.).
I understand that eventlet will be dropped in the future, but should we set heartbeat_in_pthread to True by default until then?
Regards,
Arnaud.
[1] https://review.opendev.org/c/openstack/oslo.messaging/+/925778