[all][oslo] revisiting the proposal to deprecate eventlet from oslo.service
Hi - We've come to an impasse regarding our efforts to create a new backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely. All of that falls under things that have been agreed upon. The current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used: 1. being selected at runtime via CONF from the .cfg file , which is what was originally agreed upon 2. at start up time via environment variable 3. as a fixed setting in source code. Personally, as someone who has worked for decades on multiple switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away. More concretely, if you took nova, which were using oslo.service with eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case. In the discussion at [1], Takashi Kajinami writes that a PTG discussion (for which I dont have any link or background) explicitly decided against the above idea:
In the initial design we considered that developer may make the decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model.
However there was some strong push back during the previous PTG and people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR)
So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case. As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful. As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends.
The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way. So as I am tasked along with Daniel Bengtsson and Herve Beraud with getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other. thanks for reading! [1] https://review.opendev.org/c/openstack/oslo.service/+/935783 [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c
On 1/21/25 1:05 PM, Mike Bayer wrote:
Hi -
We've come to an impasse regarding our efforts to create a new backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely.
All of that falls under things that have been agreed upon. The current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used:
1. being selected at runtime via CONF from the .cfg file , which is what was originally agreed upon
2. at start up time via environment variable
3. as a fixed setting in source code.
Personally, as someone who has worked for decades on multiple switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away.
More concretely, if you took nova, which were using oslo.service with eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case.
In the discussion at [1], Takashi Kajinami writes that a PTG discussion (for which I dont have any link or background) explicitly decided against the above idea:
In the initial design we considered that developer may make the decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model. However there was some strong push back during the previous PTG and people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend. If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR) So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case.
As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful.
As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends. The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way.
I don't mean to re-litigate things that may have been decided; if this discussion happened and has been settled that's fine (I'm not a core on oslo anyway) -- but when reviewing the code, I found no reason an operator would want to change this value -- and with my Ironic hat on -- I explicitly do not want operators choosing implementation details of my software -- such as if it's using an eventlet or threading-based backend for service management. If option #1 or #2 were to proceed, I'd propose a change to Ironic which would cause us to refuse to startup if a different backend than requested was enabled. The last thing we need in OpenStack is more items added to our testing matrix; which intentionally or not is what making the backend operator-controllable does. - Jay Faulkner
So as I am tasked along with Daniel Bengtsson and Herve Beraud with getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other.
thanks for reading!
[1] https://review.opendev.org/c/openstack/oslo.service/+/935783 [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c
Hi, Dnia wtorek, 21 stycznia 2025 22:59:10 czas środkowoeuropejski standardowy Jay Faulkner pisze:
On 1/21/25 1:05 PM, Mike Bayer wrote:
Hi -
We've come to an impasse regarding our efforts to create a new backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely.
All of that falls under things that have been agreed upon. The current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used:
1. being selected at runtime via CONF from the .cfg file , which is what was originally agreed upon
2. at start up time via environment variable
3. as a fixed setting in source code.
Personally, as someone who has worked for decades on multiple switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away.
I am also for option 3 - I don't think there is any reason to give choice about that for the operators. This IMO should be internal decision of the project's devs team what backend is used by their project and how project is working.
More concretely, if you took nova, which were using oslo.service with eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case.
In the discussion at [1], Takashi Kajinami writes that a PTG discussion (for which I dont have any link or background) explicitly decided against the above idea:
In the initial design we considered that developer may make the decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model. However there was some strong push back during the previous PTG and people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
I don't really agree with that "previously worked well" thing. Even if the team will make an effort to maintain two possibilities, there may be changes in the code which may unintentionally break this "previously worked well" eventlet model. There is no guarantee it will be always exactly the same as it was previously.
If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR) So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case.
As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful.
As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends. The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way.
I don't mean to re-litigate things that may have been decided; if this discussion happened and has been settled that's fine (I'm not a core on oslo anyway) -- but when reviewing the code, I found no reason an operator would want to change this value -- and with my Ironic hat on -- I explicitly do not want operators choosing implementation details of my software -- such as if it's using an eventlet or threading-based backend for service management. If option #1 or #2 were to proceed, I'd propose a change to Ironic which would cause us to refuse to startup if a different backend than requested was enabled. The last thing we need in OpenStack is more items added to our testing matrix; which intentionally or not is what making the backend operator-controllable does.
+1
- Jay Faulkner
So as I am tasked along with Daniel Bengtsson and Herve Beraud with getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other.
thanks for reading!
[1] https://review.opendev.org/c/openstack/oslo.service/+/935783 [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c
-- Slawek Kaplonski Principal Software Engineer Red Hat
+1 for option 3 Le mer. 22 janv. 2025 à 09:52, Sławek Kapłoński <skaplons@redhat.com> a écrit :
Hi,
Dnia wtorek, 21 stycznia 2025 22:59:10 czas środkowoeuropejski standardowy Jay Faulkner pisze:
On 1/21/25 1:05 PM, Mike Bayer wrote:
Hi -
We've come to an impasse regarding our efforts to create a new backend
All of that falls under things that have been agreed upon. The
current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used:
1. being selected at runtime via CONF from the .cfg file , which is
what was originally agreed upon
2. at start up time via environment variable
3. as a fixed setting in source code.
Personally, as someone who has worked for decades on multiple
switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate
selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely. this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away.
I am also for option 3 - I don't think there is any reason to give choice about that for the operators. This IMO should be internal decision of the project's devs team what backend is used by their project and how project is working.
More concretely, if you took nova, which were using oslo.service with
eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case.
In the discussion at [1], Takashi Kajinami writes that a PTG
discussion (for which I dont have any link or background) explicitly decided against the above idea:
In the initial design we considered that developer may make the
However there was some strong push back during the previous PTG and
decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model. people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
I don't really agree with that "previously worked well" thing. Even if the team will make an effort to maintain two possibilities, there may be changes in the code which may unintentionally break this "previously worked well" eventlet model. There is no guarantee it will be always exactly the same as it was previously.
If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR) So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case.
As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful.
As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends. The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way.
I don't mean to re-litigate things that may have been decided; if this discussion happened and has been settled that's fine (I'm not a core on oslo anyway) -- but when reviewing the code, I found no reason an operator would want to change this value -- and with my Ironic hat on -- I explicitly do not want operators choosing implementation details of my software -- such as if it's using an eventlet or threading-based backend for service management. If option #1 or #2 were to proceed, I'd propose a change to Ironic which would cause us to refuse to startup if a different backend than requested was enabled. The last thing we need in OpenStack is more items added to our testing matrix; which intentionally or not is what making the backend operator-controllable does.
+1
- Jay Faulkner
So as I am tasked along with Daniel Bengtsson and Herve Beraud with
getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other.
thanks for reading!
[1] https://review.opendev.org/c/openstack/oslo.service/+/935783 [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/
+1 for option3 as well Developers know best when project is mature to change the default and when they are ready to commit fixing any unexpected bugs due to such a change, communicated clearly in doc/release notes and operators should adapt to that decision. On 22 Jan 2025, at 10:34, Herve Beraud <hberaud@redhat.com> wrote: You don't often get email from hberaud@redhat.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> +1 for option 3 Le mer. 22 janv. 2025 à 09:52, Sławek Kapłoński <skaplons@redhat.com<mailto:skaplons@redhat.com>> a écrit : Hi, Dnia wtorek, 21 stycznia 2025 22:59:10 czas środkowoeuropejski standardowy Jay Faulkner pisze:
On 1/21/25 1:05 PM, Mike Bayer wrote:
Hi -
We've come to an impasse regarding our efforts to create a new backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely.
All of that falls under things that have been agreed upon. The current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used:
1. being selected at runtime via CONF from the .cfg file , which is what was originally agreed upon
2. at start up time via environment variable
3. as a fixed setting in source code.
Personally, as someone who has worked for decades on multiple switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away.
I am also for option 3 - I don't think there is any reason to give choice about that for the operators. This IMO should be internal decision of the project's devs team what backend is used by their project and how project is working.
More concretely, if you took nova, which were using oslo.service with eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case.
In the discussion at [1], Takashi Kajinami writes that a PTG discussion (for which I dont have any link or background) explicitly decided against the above idea:
In the initial design we considered that developer may make the decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model. However there was some strong push back during the previous PTG and people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
I don't really agree with that "previously worked well" thing. Even if the team will make an effort to maintain two possibilities, there may be changes in the code which may unintentionally break this "previously worked well" eventlet model. There is no guarantee it will be always exactly the same as it was previously.
If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR) So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case.
As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful.
As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends. The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way.
I don't mean to re-litigate things that may have been decided; if this discussion happened and has been settled that's fine (I'm not a core on oslo anyway) -- but when reviewing the code, I found no reason an operator would want to change this value -- and with my Ironic hat on -- I explicitly do not want operators choosing implementation details of my software -- such as if it's using an eventlet or threading-based backend for service management. If option #1 or #2 were to proceed, I'd propose a change to Ironic which would cause us to refuse to startup if a different backend than requested was enabled. The last thing we need in OpenStack is more items added to our testing matrix; which intentionally or not is what making the backend operator-controllable does.
+1
- Jay Faulkner
So as I am tasked along with Daniel Bengtsson and Herve Beraud with getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other.
thanks for reading!
[1] https://review.opendev.org/c/openstack/oslo.service/+/935783 [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c
-- Slawek Kaplonski Principal Software Engineer Red Hat -- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/
To me, the CONF path is about risk mitigation. I.e. if an issue pops up with the threaded backend we would have a quick way to switch back. That said, one of my comments on the spec was that I would like to see the implementation not even import eventlet if it's running on the threaded backend as there are many gremlins in libraries that make code path decisions based on if eventlet is imported (though that is not the right way to check if it is in use). Not to mention it is package bloat. I also agree that having both backend options doubles the (not fun) testing burden with limited benefit. So, I am good with option #3. This does bring up the question of should we bother messing with oslo.service as it is, but instead just lift and shift cotyledon. I.e. option #3 comes down to the imported library. Michael On Wed, Jan 22, 2025 at 6:46 AM Tobias Urdin - Binero IT < tobias.urdin@binero.com> wrote:
+1 for option3 as well
Developers know best when project is mature to change the default and when they are ready to commit fixing any unexpected bugs due to such a change, communicated clearly in doc/release notes and operators should adapt to that decision.
On 22 Jan 2025, at 10:34, Herve Beraud <hberaud@redhat.com> wrote:
You don't often get email from hberaud@redhat.com. Learn why this is important <https://aka.ms/LearnAboutSenderIdentification> +1 for option 3
Le mer. 22 janv. 2025 à 09:52, Sławek Kapłoński <skaplons@redhat.com> a écrit :
Hi,
Dnia wtorek, 21 stycznia 2025 22:59:10 czas środkowoeuropejski standardowy Jay Faulkner pisze:
On 1/21/25 1:05 PM, Mike Bayer wrote:
Hi -
We've come to an impasse regarding our efforts to create a new
All of that falls under things that have been agreed upon. The
current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used:
1. being selected at runtime via CONF from the .cfg file , which is
what was originally agreed upon
2. at start up time via environment variable
3. as a fixed setting in source code.
Personally, as someone who has worked for decades on multiple
switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate
backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely. this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away.
I am also for option 3 - I don't think there is any reason to give choice about that for the operators. This IMO should be internal decision of the project's devs team what backend is used by their project and how project is working.
More concretely, if you took nova, which were using oslo.service with
eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case.
In the discussion at [1], Takashi Kajinami writes that a PTG
discussion (for which I dont have any link or background) explicitly decided against the above idea:
In the initial design we considered that developer may make the
However there was some strong push back during the previous PTG and
decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model. people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
I don't really agree with that "previously worked well" thing. Even if the team will make an effort to maintain two possibilities, there may be changes in the code which may unintentionally break this "previously worked well" eventlet model. There is no guarantee it will be always exactly the same as it was previously.
If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR) So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case.
As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful.
As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends. The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way.
I don't mean to re-litigate things that may have been decided; if this discussion happened and has been settled that's fine (I'm not a core on oslo anyway) -- but when reviewing the code, I found no reason an operator would want to change this value -- and with my Ironic hat on -- I explicitly do not want operators choosing implementation details of my software -- such as if it's using an eventlet or threading-based backend for service management. If option #1 or #2 were to proceed, I'd propose a change to Ironic which would cause us to refuse to startup if a different backend than requested was enabled. The last thing we need in OpenStack is more items added to our testing matrix; which intentionally or not is what making the backend operator-controllable does.
+1
- Jay Faulkner
So as I am tasked along with Daniel Bengtsson and Herve Beraud with
getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other.
thanks for reading!
[1] https://review.opendev.org/c/openstack/oslo.service/+/935783 [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/
On 22/01/2025 16:34, Michael Johnson wrote:
To me, the CONF path is about risk mitigation. I.e. if an issue pops up with the threaded backend we would have a quick way to switch back.
That said, one of my comments on the spec was that I would like to see the implementation not even import eventlet if it's running on the threaded backend as there are many gremlins in libraries that make code path decisions based on if eventlet is imported (though that is not the right way to check if it is in use). Not to mention it is package bloat.
I also agree that having both backend options doubles the (not fun) testing burden with limited benefit.
So, I am good with option #3.
This does bring up the question of should we bother messing with oslo.service as it is, but instead just lift and shift cotyledon. I.e. option #3 comes down to the imported library.
if we do that it makes it much hard for project to migrate. so yes we shoudl do the oslo service changes as we currently plan to.
Michael
On Wed, Jan 22, 2025 at 6:46 AM Tobias Urdin - Binero IT <tobias.urdin@binero.com> wrote:
+1 for option3 as well
Developers know best when project is mature to change the default and when they are ready to commit fixing any unexpected bugs due to such a change, communicated clearly in doc/release notes and operators should adapt to that decision.
On 22 Jan 2025, at 10:34, Herve Beraud <hberaud@redhat.com> wrote:
You don't often get email from hberaud@redhat.com. Learn why this is important <https://aka.ms/LearnAboutSenderIdentification>
+1 for option 3
Le mer. 22 janv. 2025 à 09:52, Sławek Kapłoński <skaplons@redhat.com> a écrit :
Hi,
Dnia wtorek, 21 stycznia 2025 22:59:10 czas środkowoeuropejski standardowy Jay Faulkner pisze: > > On 1/21/25 1:05 PM, Mike Bayer wrote: > > Hi - > > > > We've come to an impasse regarding our efforts to create a new backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely. > > > > All of that falls under things that have been agreed upon. The current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used: > > > > 1. being selected at runtime via CONF from the .cfg file , which is what was originally agreed upon > > > > 2. at start up time via environment variable > > > > 3. as a fixed setting in source code. > > > > Personally, as someone who has worked for decades on multiple switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away.
I am also for option 3 - I don't think there is any reason to give choice about that for the operators. This IMO should be internal decision of the project's devs team what backend is used by their project and how project is working.
> > > > More concretely, if you took nova, which were using oslo.service with eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case. > > > > In the discussion at [1], Takashi Kajinami writes that a PTG discussion (for which I dont have any link or background) explicitly decided against the above idea: > > > >> In the initial design we considered that developer may make the decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model. > >> However there was some strong push back during the previous PTG and people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
I don't really agree with that "previously worked well" thing. Even if the team will make an effort to maintain two possibilities, there may be changes in the code which may unintentionally break this "previously worked well" eventlet model. There is no guarantee it will be always exactly the same as it was previously.
> >> If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR) > > So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case. > > > > As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful. > > > > As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner: > > > >> Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends. > > The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way. > > I don't mean to re-litigate things that may have been decided; if this > discussion happened and has been settled that's fine (I'm not a core on > oslo anyway) -- but when reviewing the code, I found no reason an > operator would want to change this value -- and with my Ironic hat on -- > I explicitly do not want operators choosing implementation details of my > software -- such as if it's using an eventlet or threading-based backend > for service management. If option #1 or #2 were to proceed, I'd propose > a change to Ironic which would cause us to refuse to startup if a > different backend than requested was enabled. The last thing we need in > OpenStack is more items added to our testing matrix; which intentionally > or not is what making the backend operator-controllable does.
+1
> > > - > Jay Faulkner > > > > > So as I am tasked along with Daniel Bengtsson and Herve Beraud with getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other. > > > > thanks for reading! > > > > > > > > [1] https://review.opendev.org/c/openstack/oslo.service/+/935783 > > [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 > > [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c >
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/
On 1/22/25 6:05 AM, Mike Bayer wrote:
Hi -
We've come to an impasse regarding our efforts to create a new backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely.
All of that falls under things that have been agreed upon. The current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used:
1. being selected at runtime via CONF from the .cfg file , which is what was originally agreed upon
2. at start up time via environment variable
3. as a fixed setting in source code.
Personally, as someone who has worked for decades on multiple switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away.
More concretely, if you took nova, which were using oslo.service with eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case.
In the discussion at [1], Takashi Kajinami writes that a PTG discussion (for which I dont have any link or background) explicitly decided against the above idea:
We discussed this during the cross project session on Wednesday during the discussion about its upgrade impact and I remember John Garbutt raised his concern mainly about performance impact of migration from eventlet and the possible switching back to eventlet was proposed to mitigate the performance impact until all performance tests/improvements are done. Unfortunately the details of the discussion is not captured in the etherpad but we can see the related discussion which happened later in nova PTG[1]. [1] https://etherpad.opendev.org/p/nova-2025.1-ptg#L625 However, at that time we've not yet clearly evaluated the complexity of keeping backends selectable and as mentioned in the above etherpad, we agreed that we may re-evaluate it once we complete investigation. I think we now clearly understand it is too much complicated to support both eventlet and threading (some work has been done already in neutron and re-implementing the possibility to switch back to eventlet there might not be feasible, from my view) and am in favor of not providing options, which we planned initially.
In the initial design we considered that developer may make the decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model.
However there was some strong push back during the previous PTG and people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR)
So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case.
As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful.
As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends.
The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way.
So as I am tasked along with Daniel Bengtsson and Herve Beraud with getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other.
thanks for reading!
[1] https://review.opendev.org/c/openstack/oslo.service/+/935783 [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c
On 22/01/2025 12:32, Takashi Kajinami wrote:
On 1/22/25 6:05 AM, Mike Bayer wrote:
Hi -
We've come to an impasse regarding our efforts to create a new backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2]. Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded". A calling application may choose to use either backend exclusively. Applications that use oslo.service right now are essentially using the "eventlet" backend. As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely.
All of that falls under things that have been agreed upon. The current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used:
1. being selected at runtime via CONF from the .cfg file , which is what was originally agreed upon
2. at start up time via environment variable
3. as a fixed setting in source code.
Personally, as someone who has worked for decades on multiple switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms. Which I would argue is a waste of effort since eventlet is going away.
More concretely, if you took nova, which were using oslo.service with eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code. For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made. If your code has other explicit use of eventlet, this also needs to be changed. Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies. Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading. The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large. Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet. More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent. If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case.
ya so the first step i took was trying to remove explicit import of eventlet form the code outside of a a few functions in a single module. if we had to spinkel if to see if we were monkey patches or not though out the code base that would be a non starter from a maintainability poit of view. This will force use to refactor some code patterns and interface so use future or other changes to that the code can function with or without eventlet. an example of this is we cant use eventlets timeout context manager anymore and need to either pass the timeout down to the lib we are callign if it suprpot it or refactor the code to function diffently. i.e. dispatch the call to an executor and use the timeout on the returned future to limit the time we wait. those types of changes are complex to maintain overtime and while we cant avoid those refactoings entirely, there will be less of them if we only support one model per executable/binary in any given release.
In the discussion at [1], Takashi Kajinami writes that a PTG discussion (for which I dont have any link or background) explicitly decided against the above idea:
We discussed this during the cross project session on Wednesday during the discussion about its upgrade impact and I remember John Garbutt raised his concern mainly about performance impact of migration from eventlet and the possible switching back to eventlet was proposed to mitigate the performance impact until all performance tests/improvements are done. Unfortunately the details of the discussion is not captured in the etherpad but we can see the related discussion which happened later in nova PTG[1].
[1] https://etherpad.opendev.org/p/nova-2025.1-ptg#L625
However, at that time we've not yet clearly evaluated the complexity of keeping backends selectable and as mentioned in the above etherpad, we agreed that we may re-evaluate it once we complete investigation.
I think we now clearly understand it is too much complicated to support both eventlet and threading (some work has been done already in neutron and re-implementing the possibility to switch back to eventlet there might not be feasible, from my view) and am in favor of not providing options, which we planned initially.
so for nova we where considering using a nova specific environment valiabel for internal testing reasons but our intent was to migrate one binary at a time and only have one implementation of each binary in any given release. the initial work i started on too this approach and did not allow any configurablity. we didn't intend to make this a runtime configuration that operator would use to switch between backends really but we were evaulating if it wold be useful for use for ci. im not actively working on this this cycle, i dont think anyone has had tiem to do it either so we could discuss this again in the next ptg but this is likely not something that need to be implemented at the oslo level and would be a project by project discussions.
In the initial design we considered that developer may make the decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model.
However there was some strong push back during the previous PTG and people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR)
So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up. Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case.
As a compromise, I suggested option 2. using an environment variable. This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code. However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful.
As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends.
The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way.
So as I am tasked along with Daniel Bengtsson and Herve Beraud with getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other.
there are two competing concerns. i as the person in nova that was workign on it was concerned by the complexity makign this configurable would introduce and the impact that would have on completing the work. others rightly raised concerns that orur ci does not necessarily give a good picture of the scalability/perfomce of any non eventlet based solution and we might need to support installing with or without eventlet for one slurp release, specifically 2026.1 before fully removing the eventlet support in 2026.2 or 2027.1 once all services are updated. i don't think nova will be at a point where it can run without eventlet by 2026.1, it might but my hope was that each release less and less of nova would use eventlet on a per binary (nova-comptue, nova-conductor, ...) basis.
thanks for reading!
[1] https://review.opendev.org/c/openstack/oslo.service/+/935783 [2] https://review.opendev.org/c/openstack/oslo-specs/+/927503 [3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c
participants (8)
-
Herve Beraud
-
Jay Faulkner
-
Michael Johnson
-
Mike Bayer
-
Sean Mooney
-
Sławek Kapłoński
-
Takashi Kajinami
-
Tobias Urdin - Binero IT