[all][tc] Eventlet migration: Call to action
Hi all, As you may have read in Goutham's TC summary, the existing eventlet goal[0] is unlikely to gain consensus to merge. This appears to be due to the difference in needs between OpenStack projects, and belief that it's unlikely a single solution would serve them all. Our current situation is: 1) Eventlet usage must be discontinued. The temporary maintainers are just that -- temporary -- and we should not expect eventlet to continue to be a valid path moving forward. 2) We have basic agreement that shared tooling, such as oslo libraries, should not dictate a specific threading model. In lieu of specific technical leadership and guidance from the TC, this leaves it to individual projects to decide what threading model to use instead of eventlet monkey patching. The unmerged spec[0] may be a good starting point if you're wondering where to start. Importantly: do not let this work wait! It is a significant change and will take a long time to complete. Thanks, Jay Faulkner 0: https://review.opendev.org/c/openstack/governance/+/902585
Thanks Jay for this thread and for summarizing the T.C feeling. My following sentences are not directly addressed to you (Jay), but to others T.C members. Le mar. 14 mai 2024 à 18:28, Jay Faulkner <jay@gr-oss.io> a écrit :
Hi all,
As you may have read in Goutham's TC summary, the existing eventlet goal[0] is unlikely to gain consensus to merge. This appears to be due to the difference in needs between OpenStack projects, and belief that it's unlikely a single solution would serve them all.
I might agree with the fact it's unlikely a single solution would serve them all. I was going to update some sentence to make this optionality more stronger: https://review.opendev.org/c/openstack/governance/+/902585/comment/bb363bda_... Also, I was going to propose some adaptations to introduce futurist and executors: https://review.opendev.org/c/openstack/governance/+/902585/comment/6b23377c_... But, rather than speaking of "a single solution", I'd suggest speaking about "a solution by default". Indeed, many comments made in this proposal strongly adopt Nova's perspective. Nova is a major piece of Openstack, but Openstack is not only Nova. By updating that sentence (see the previous links), we would make this proposal optional for teams who have the capabilities to design their own alternatives. Nova. Teams like Nova have resources that other teams may not have. T.C members should consider this difference in resources between teams. T.C members are elected to adopt a global point of view, and not a team based point of view: https://governance.openstack.org/tc/reference/principles.html#openstack-firs... Else, without any default optional solution, teams with lower resources will be deprived of any solutions... Teams should not consider the use of their surplus as legitimate when other teams are deprived of what is necessary. As a community member, this is the kind of mindset that I expect from the Openstack governance.
Our current situation is: 1) Eventlet usage must be discontinued. The temporary maintainers are just that -- temporary -- and we should not expect eventlet to continue to be a valid path moving forward. 2) We have basic agreement that shared tooling, such as oslo libraries, should not dictate a specific threading model.
If by "should not dictate a specific threading model", you mean that Oslo forced the use of async/await, then I STRONGLY disagree about 2)... Please show me where, in the proposed goal, oslo libraries dictate a specific threading model? => https://review.opendev.org/c/openstack/governance/+/902585 FYI the libs migration is detailed in the "How to migrate a library" section (around line 646 of the proposed goal). And a living example of how the libs would work is available at https://github.com/4383/snippets/blob/main/python/facade/facade.py To conclude, is it still worth it to spend time updating this proposal again or is your decision definitive? -- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/
On Thu, 2024-05-16 at 10:27 +0200, Herve Beraud wrote:
Thanks Jay for this thread and for summarizing the T.C feeling. My following sentences are not directly addressed to you (Jay), but to others T.C members.
Le mar. 14 mai 2024 à 18:28, Jay Faulkner <jay@gr-oss.io> a écrit :
Hi all,
As you may have read in Goutham's TC summary, the existing eventlet goal[0] is unlikely to gain consensus to merge. This appears to be due to the difference in needs between OpenStack projects, and belief that it's unlikely a single solution would serve them all.
I might agree with the fact it's unlikely a single solution would serve them all. I was going to update some sentence to make this optionality more stronger: https://review.opendev.org/c/openstack/governance/+/902585/comment/bb363bda_...
Also, I was going to propose some adaptations to introduce futurist and executors: https://review.opendev.org/c/openstack/governance/+/902585/comment/6b23377c_...
speaking of executors for nova we have approved a specless bluepirnt to adress some of the low hanging fruit https://blueprints.launchpad.net/nova/+spec/eventlet-removal-part-1 which is implemented here https://review.opendev.org/q/topic:%22eventlet-removal-part-1%22 although i will be revising that at least once more before its ready to merge. the last two patches in the series remove eventlet.tpool and replace its use with futurist a thread pool executor instead for our specific usage https://review.opendev.org/c/openstack/nova/+/917962/4/nova/storage/rbd_util... that's a relatively simple transformation. there are one our too outer uses of eventlet.tpool in nova and i will also adress the usage in the guestfs module but i will likely no touch its usage in the libvirt driver as we have a rather complex code dedicated to how we interact libvirt that need careful thought to untangle.
But, rather than speaking of "a single solution", I'd suggest speaking about "a solution by default".
Indeed, many comments made in this proposal strongly adopt Nova's perspective. Nova is a major piece of Openstack, but Openstack is not only Nova. By updating that sentence (see the previous links), we would make this proposal optional for teams who have the capabilities to design their own alternatives. Nova.
Teams like Nova have resources that other teams may not have.
yes and no. the nova teams capacity is fairly limited these days. we have between 5-10 active contributors depending on how you look at it. refining that by the people that understand our use of eventlet and can review the code and factor in have the time rewrite the relevnet code while being upstream consistently enought to actually make progress that probably means about 3-5 people could actually do this. at lesat 2 of those people are actively working on finisnhing multi cycle efforts and the rest including myself are heavily engaged in dowstream work or work in other projects that limits our upstream review and development time for at least the next 3-6 months. Nova is trying to be pragmatic that we do not have enough contributor to rewrite the part of nova that would require it to asyncio without heavily impacting the projects viablity. so we are looking at using futurist executors instead because its a lot less work to achieve the goal of not using eventlet while not prohibiting us form exploring asyncio or other options eventually if it makes sense. if we had to replace eventlet with asyncio im not confident we could complete that for nova in the next 12 -18 months. not without effectively stopping all other development. we just don't have the review or code writing capacity to do something that large and complex while actually ensuring we don't introduce regressions if we were doing anything else.
T.C members should consider this difference in resources between teams.
T.C members are elected to adopt a global point of view, and not a team based point of view: https://governance.openstack.org/tc/reference/principles.html#openstack-firs...
Else, without any default optional solution, teams with lower resources will be deprived of any solutions...
Teams should not consider the use of their surplus as legitimate when other teams are deprived of what is necessary.
As a community member, this is the kind of mindset that I expect from the Openstack governance.
Our current situation is: 1) Eventlet usage must be discontinued. The temporary maintainers are just that -- temporary -- and we should not expect eventlet to continue to be a valid path moving forward. 2) We have basic agreement that shared tooling, such as oslo libraries, should not dictate a specific threading model.
If by "should not dictate a specific threading model", you mean that Oslo forced the use of async/await, then I STRONGLY disagree about 2)... Please show me where, in the proposed goal, oslo libraries dictate a specific threading model? => https://review.opendev.org/c/openstack/governance/+/902585
that more a refence ot not forcing async await rather then the internal impelmations. i.e. dont require your users to use asynio
FYI the libs migration is detailed in the "How to migrate a library" section (around line 646 of the proposed goal). And a living example of how the libs would work is available at https://github.com/4383/snippets/blob/main/python/facade/facade.py
that snipit seams to assume its oke to create an asyncio eventlop on the fly in any geiven invocation which is proably going to be too slow for many usecases i.e. if oslo log did that every time we logged that would be a problem. i have not messuered the startup time of the asynio eventloop but i expect its more expensive then a trival python function call. if that loop was reused between calls to the lib or even kicked out into its own thread perhaps with a graceful shutdown if noting is run for an extended period of time like privsep, then i can see that variation fo the facade pattern working. we ould have to mussure the overhad of _iter_coroutine https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_running... does appear to be thread safe so in principal it should not cause any cross thread interactions. that was one of my concerns when lookign at the snipit initally.
To conclude, is it still worth it to spend time updating this proposal again or is your decision definitive?
Le jeu. 16 mai 2024 à 13:31, <smooney@redhat.com> a écrit :
On Thu, 2024-05-16 at 10:27 +0200, Herve Beraud wrote:
Thanks Jay for this thread and for summarizing the T.C feeling. My following sentences are not directly addressed to you (Jay), but to others T.C members.
Le mar. 14 mai 2024 à 18:28, Jay Faulkner <jay@gr-oss.io> a écrit :
Hi all,
As you may have read in Goutham's TC summary, the existing eventlet goal[0] is unlikely to gain consensus to merge. This appears to be due to the difference in needs between OpenStack projects, and belief that it's unlikely a single solution would serve them all.
I might agree with the fact it's unlikely a single solution would serve them all. I was going to update some sentence to make this optionality more stronger:
https://review.opendev.org/c/openstack/governance/+/902585/comment/bb363bda_...
Also, I was going to propose some adaptations to introduce futurist and executors:
https://review.opendev.org/c/openstack/governance/+/902585/comment/6b23377c_...
speaking of executors for nova we have approved a specless bluepirnt to adress some of the low hanging fruit https://blueprints.launchpad.net/nova/+spec/eventlet-removal-part-1
Nice to see this initiative, maybe we could reference it (in the proposal or elsewhere) and give additional details to offer a way to go to others. Again, I'm not married to asyncio. To initiate this topic we had to choose a solution, I chose Asyncio, but any proposed solution is welcomed, as long as it helps to exit from the Eventlet dead end.
which is implemented here https://review.opendev.org/q/topic:%22eventlet-removal-part-1%22 although i will be revising that at least once more before its ready to merge.
the last two patches in the series remove eventlet.tpool and replace its use with futurist a thread pool executor instead
for our specific usage https://review.opendev.org/c/openstack/nova/+/917962/4/nova/storage/rbd_util... that's a relatively simple transformation.
there are one our too outer uses of eventlet.tpool in nova and i will also adress the usage in the guestfs module but i will likely no touch its usage in the libvirt driver as we have a rather complex code dedicated to how we interact libvirt that need careful thought to untangle.
But, rather than speaking of "a single solution", I'd suggest speaking about "a solution by default".
Indeed, many comments made in this proposal strongly adopt Nova's perspective. Nova is a major piece of Openstack, but Openstack is not only Nova. By updating that sentence (see the previous links), we would make this proposal optional for teams who have the capabilities to design their own alternatives. Nova.
Teams like Nova have resources that other teams may not have.
yes and no. the nova teams capacity is fairly limited these days. we have between 5-10 active contributors depending on how you look at it. refining that by the people that understand our use of eventlet and can review the code and factor in have the time rewrite the relevnet code while being upstream consistently enought to actually make progress that probably means about 3-5 people could actually do this.
at lesat 2 of those people are actively working on finisnhing multi cycle efforts and the rest including myself are heavily engaged in dowstream work or work in other projects that limits our upstream review and development time for at least the next 3-6 months.
Nova is trying to be pragmatic that we do not have enough contributor to rewrite the part of nova that would require it to asyncio without heavily impacting the projects viablity.
so we are looking at using futurist executors instead because its a lot less work to achieve the goal of not using eventlet while not prohibiting us form exploring asyncio or other options eventually if it makes sense.
if we had to replace eventlet with asyncio im not confident we could complete that for nova in the next 12 -18 months. not without effectively stopping all other development. we just don't have the review or code writing capacity to do something that large and complex while actually ensuring we don't introduce regressions if we were doing anything else.
T.C members should consider this difference in resources between teams.
T.C members are elected to adopt a global point of view, and not a team based point of view:
https://governance.openstack.org/tc/reference/principles.html#openstack-firs...
Else, without any default optional solution, teams with lower resources will be deprived of any solutions...
Teams should not consider the use of their surplus as legitimate when
teams are deprived of what is necessary.
As a community member, this is the kind of mindset that I expect from the Openstack governance.
Our current situation is: 1) Eventlet usage must be discontinued. The temporary maintainers are just that -- temporary -- and we should not expect eventlet to continue to be a valid path moving forward. 2) We have basic agreement that shared tooling, such as oslo libraries, should not dictate a specific threading model.
If by "should not dictate a specific threading model", you mean that Oslo forced the use of async/await, then I STRONGLY disagree about 2)... Please show me where, in the proposed goal, oslo libraries dictate a specific threading model? => https://review.opendev.org/c/openstack/governance/+/902585
other that more a refence ot not forcing async await rather then the internal impelmations. i.e. dont require your users to use asynio
FYI the libs migration is detailed in the "How to migrate a library" section (around line 646 of the proposed goal). And a living example of how the libs would work is available at https://github.com/4383/snippets/blob/main/python/facade/facade.py
that snipit seams to assume its oke to create an asyncio eventlop on the fly in any geiven invocation which is proably going to be too slow for many usecases
i.e. if oslo log did that every time we logged that would be a problem. i have not messuered the startup time of the asynio eventloop but i expect its more expensive then a trival python function call.
if that loop was reused between calls to the lib or even kicked out into its own thread perhaps with a graceful shutdown if noting is run for an extended period of time like privsep, then i can see that variation fo the facade pattern working. we ould have to mussure the overhad of _iter_coroutine
https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_running... does appear to be thread safe so in principal it should not cause any cross thread interactions. that was one of my concerns when lookign at the snipit initally.
Another option for libraries, would be to implement in various oslo libraries, new drivers/backends fully based on third parties libs who are based on asyncio, and in parallel to continue to maintain the current existing drivers/backends. Mike Bayer suggested that oslo.db may provide an asyncio interface for applications that want to make use of this. To that end the main working interface in oslo.db is oslo_db.sqlalchemy.enginefacade, so we would propose a layer on top called oslo_db.sqlalchemy.asyncio_enginefacade or similar. it would provide the async versions of SQLAlchemy objects as well as provide for asyncio context managers. The existing interface will remain there in parallel. Oslo.messaging may implement a new rabbitmq driver based on https://github.com/Polyconseil/aioamqp and keep the existing driver around. I agree we should not want to close the door for alternatives like threads and futurist, If futurist avoid us wasting too much time removing Eventlet, then let's go with this kind of option, but in all cases I think we should not close the door to asyncio either. Using Asyncio may make sense in many scenarios. If libraries do not provide async based interfaces in parallel with the sync based interfaces, then, I think we more or less definitely close the door of using Asyncio in Openstack or at list we really limit its introduction.
To conclude, is it still worth it to spend time updating this proposal again or is your decision definitive?
Thanks Sean -- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/
On 2024-05-16 12:31:15 +0100 (+0100), smooney@redhat.com wrote: [...]
yes and no. the nova teams capacity is fairly limited these days. we have between 5-10 active contributors depending on how you look at it. refining that by the people that understand our use of eventlet and can review the code and factor in have the time rewrite the relevnet code while being upstream consistently enought to actually make progress that probably means about 3-5 people could actually do this.
at lesat 2 of those people are actively working on finisnhing multi cycle efforts and the rest including myself are heavily engaged in dowstream work or work in other projects that limits our upstream review and development time for at least the next 3-6 months. [...]
Not to trivialize the Nova team's struggles, but that's a luxurious amount of active contributors compared to other teams (and yes, Nova is larger and more complex than almost anything else in OpenStack, so it's not as many people as it sounds like, I get that). I do still think that Hervé is right about Nova being in a better situation than, say, some service limping along on one or two part-time contributors for whom the thread model they've inherited is largely a black box they're afraid to touch because they don't even know how to debug it effectively. Then again, maybe forcing everyone to move off Eventlet will help us identify more services that aren't sufficiently maintained any longer, and retire them. -- Jeremy Stanley
To conclude, is it still worth it to spend time updating this proposal
again or is your decision definitive?
Accounting to the absence of response from the TC to my previous question ^, and according to the fact that our decisions are based on the lazy consensus [1], I take this absence of response and the absence of objection as a desire from the community to continue the discussion and the research of solutions to our problem. For this reason, I just submitted a new version of the goal proposal: https://review.opendev.org/c/openstack/governance/+/902585 This is almost a complete rewriting which aims to bring more pragmatism in the proposed solution. Thanks for your attention, and thanks to all the people who already replied to this thread. Cheers [1] https://governance.openstack.org/tc/reference/opens.html#open-community -- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/
Hi Jay, thank you for your reminder. Given the lack of both consensus and resources, I wonder if the scaled-down path forward would be to just switch to normal threading. Then whichever projects face issues with that (I don't expect Ironic to be one, for instance) can consider asyncio or whatever. Otherwise, we're going to get stuck. Dmitry On 5/14/24 18:20, Jay Faulkner wrote:
Hi all,
As you may have read in Goutham's TC summary, the existing eventlet goal[0] is unlikely to gain consensus to merge. This appears to be due to the difference in needs between OpenStack projects, and belief that it's unlikely a single solution would serve them all.
Our current situation is: 1) Eventlet usage must be discontinued. The temporary maintainers are just that -- temporary -- and we should not expect eventlet to continue to be a valid path moving forward. 2) We have basic agreement that shared tooling, such as oslo libraries, should not dictate a specific threading model.
In lieu of specific technical leadership and guidance from the TC, this leaves it to individual projects to decide what threading model to use instead of eventlet monkey patching. The unmerged spec[0] may be a good starting point if you're wondering where to start.
Importantly: do not let this work wait! It is a significant change and will take a long time to complete.
Thanks, Jay Faulkner
0: https://review.opendev.org/c/openstack/governance/+/902585 <https://review.opendev.org/c/openstack/governance/+/902585>
just to note what my plans are in this area, I intend to create an asyncio version of oslo.db enginefacade, which all openstack DB using projects are now using. the API will look just like that of https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/enginefa... except context managers will be async, and the session/connections passed around will be the SQLAlchemy async versions documented at https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html . Projects that wish to port portions of their code, or their entire code, to plain asyncio will be able to make use of oslo.db with this path, the only other requirement is that asyncio DB drivers would have to be used with this code. For SQLite there's aiosqlite https://pypi.org/project/aiosqlite/ which internally uses a thread per connection to create an asyncio effect, and for MySQL the driver we do best with is https://pypi.org/project/asyncmy/ , we also get good results these days from aiomysql https://pypi.org/project/aiomysql/ though it's had some maintenance bumps. to support applications that use oslo.db in both sync and async ways I will likely introduce a new async-only database URL parameter, which if present will be used for asyncio versions of things. if it's not present then it can fallback to the main database URL which would need to refer to an asyncio driver. On Tue, May 14, 2024, at 12:20 PM, Jay Faulkner wrote:
Hi all,
As you may have read in Goutham's TC summary, the existing eventlet goal[0] is unlikely to gain consensus to merge. This appears to be due to the difference in needs between OpenStack projects, and belief that it's unlikely a single solution would serve them all.
Our current situation is: 1) Eventlet usage must be discontinued. The temporary maintainers are just that -- temporary -- and we should not expect eventlet to continue to be a valid path moving forward. 2) We have basic agreement that shared tooling, such as oslo libraries, should not dictate a specific threading model.
In lieu of specific technical leadership and guidance from the TC, this leaves it to individual projects to decide what threading model to use instead of eventlet monkey patching. The unmerged spec[0] may be a good starting point if you're wondering where to start.
Importantly: do not let this work wait! It is a significant change and will take a long time to complete.
Thanks, Jay Faulkner
0: https://review.opendev.org/c/openstack/governance/+/902585
participants (6)
-
Dmitry Tantsur
-
Herve Beraud
-
Jay Faulkner
-
Jeremy Stanley
-
Mike Bayer
-
smooney@redhat.com