[all][oslo] revisiting the proposal to deprecate eventlet from oslo.service

22 Jan 2025

      Hi -

We've come to an impasse regarding our efforts to create a new backend selection system for oslo.service which will allow applications based on eventlet to have an incremental migration path where they can select for threads instead of eventlet [1] [2].  Briefly, the change organizes oslo.service into two different "backends", known as "eventlet" and "threaded".  A calling application may choose to use either backend exclusively.   Applications that use oslo.service right now are essentially using the "eventlet" backend.  As these applications seek to migrate away from eventlet, they'd select the "threaded" backend and tune their applications to work under this new mode, where eventually the "eventlet" backend would be removed from oslo.service and ultimately from all of openstack entirely.

All of that falls under things that have been agreed upon.  The current impasse is among three options of how this "backend" system in oslo.service may be informed of which backend should be used:  

1. being selected at runtime via CONF from the .cfg file , which is what was originally agreed upon

2. at start up time via environment variable

3. as a fixed setting in source code.

Personally, as someone who has worked for decades on multiple switchable backend-oriented systems, I'm fully in the camp of choice 3, which is that switching the backend is just the beginning of a whole series of changes and tuning that a codebase would need in order to accommodate this backend change, and having the ability to "switch the backend back to eventlet" in CONF or runtime is simply not something anyone would want to do, unless a considerable amount of additional effort were made to support the codebase being reliable under both idioms.  Which I would argue is a waste of effort since eventlet is going away.

More concretely, if you took nova, which were using oslo.service with eventlet and decided to try the "threaded" backend, that's naturally going to have a lot of implications for other parts of the code.   For example, if your code is setting eventlet monkey patching, that has to be turned off when this backend selection is made.   If your code has other explicit use of eventlet, this also needs to be changed.  Finally, threaded concurrency is naturally going to "feel" different than coroutine-based concurrency - context switches happen at different places and with different frequencies.  Typically in a large codebase, the places that are prone to race conditions can shift; areas that had no races under coroutines now have threaded concurrency problems, areas that were sensitive to coroutine issues like being CPU bound are suddenly smooth with threading.   The overall number of "working routines" can change, whereas eventlet can spin up 1000 coroutines like popcorn, under threads we now have to work with pools of threads typically a few dozen or so threads large.   Overall, if I had my whole application passing under tempest under "threading", I would have had to make changes to get there, which may very well be mutually exclusive vs. test resiliency under eventlet.   More generally, getting my application to run under threads represents a change in implementation that I expect to be permanent.   If there are continued issues running under threaded mode, those are bugs that have to be fixed in any case.

In the discussion at [1], Takashi Kajinami writes that a PTG discussion (for which I dont have any link or background) explicitly decided against the above idea:
...
In the initial design we considered that developer may make the decision to migrate from eventlet to the new mechanism, and didn't intend to provide any options to switch back to the original eventlet model.
...
However there was some strong push back during the previous PTG and people were concerned with having no mechanism to switch back to "previously worked well" eventlet model and to address the concern we agreed to introduce the option for operators to switch the backend.
...
If we don't allow users to configure the option then IMO we should not expose the option and we should use an internal flag instead (though that's basically what was disagreed with during the PTG, AFAIR)
So first off, we have already realized that going with option 1. , use CONF, is not tenable, because the CONF collection is not typically in its final populated state when imports occur, and since the "backends" are an import time selection, there's a chicken/egg problem that cannot be resolved unless applications either highly modify the way in which their applications consume CONF vs. do their basic imports, or the way we are writing our backends in [1] needs to be modified so that imports can proceed using proxy objects which then change their loaded implementation when CONF is set up.   Both of these options seem deeply complex and overarchitected in order to support the feature of "I want to change the backend in CONF" which I maintain is not going to be a real world use case.

As a compromise, I suggested option 2. using an environment variable.   This allows the backend selection to remain something that can be theoretically changed from the outside without modifying source code.  However, current deployment practices suggest this is not really useful, as again per Takashi the way in which we deploy mod_wsgi does not allow environment variables to be local per virtual host; per [3] it looks like you need a hardcoded python script anyway that sets the env the way you want which makes it not too useful.

As we were proceeding on option 2. we received some more pushback on the "env" idea from Arnaud Morin which I don't disagree with, and still more pushback and arguments leading in favor of 3, here from Jay Faulkner:
...
Using environment variables to set backends means that operators can change what backend is used. This is an easy thing for them to screw up and will end up creating a large amount of operator pain and bug churn. I am -1 in the strongest terms to the use of operator-adjustable values to enable/disable backends.
The above was surprising considering that this was supposed to be a decision that had already been made at PTG, yet there still seems to be disagreement as though the decision were not actually made in any final way.  

So as I am tasked along with Daniel Bengtsson and Herve Beraud with getting [1] merged and moving onto building out the threaded backend, I would like to ask the group here to give me some background on the concerns raised at PTG and if we can all just here revisit the whole issue and hopefully decide that at least to start, let's get this merged without any CONF/env variable process (option 3); an application that's gone through the effort to transition to threads with the help of this backend selector should be assumed to be moving forward with that implementation, and if it has problems, that's just an ordinary bug like any other.

thanks for reading!

[1] https://review.opendev.org/c/openstack/oslo.service/+/935783
[2] https://review.opendev.org/c/openstack/oslo-specs/+/927503
[3] https://gist.github.com/GrahamDumpleton/b380652b768e81a7f60c

Mike Bayer

Jay Faulkner

Sławek Kapłoński

Herve Beraud

Tobias Urdin - Binero IT

Michael Johnson

Sean Mooney

Takashi Kajinami

Sean Mooney

Daniel Bengtsson

Jay Faulkner

Mohammed Naser

tags

participants (10)