Hello TC, Please can we have some feedback from the TC concerning the situation described above and especially concerning the masakari issue with oslo.db: - https://lists.openstack.org/pipermail/openstack-discuss/2023-September/03518... - https://review.opendev.org/q/project:openstack/masakari+topic:sqlalchemy-20 It's too late to abandon masakari for the bobcat series, however, we think that the tc does have authority to request core reviewer permission to any openstack project and approve change, and hence unlock this situation. Le jeu. 21 sept. 2023 à 16:44, Herve Beraud <hberaud@redhat.com> a écrit :
We currently face a big issue. An issue which could have huge impacts on the whole Openstack and on our customers. By customers I mean all the users outside the upstream community, operators, distros maintainers, IT vendors, etc.
Our problem is that, for a given series, Bobcat in our case, there is divergence between the versions that we announce as supported to our customers, and the versions really supported in our runtime.
Let me describe the problem.
The oslo.db's versions supported within Bobcat's runtime [1] doesn't reflect the reality of the versions really generated during Bobcat [2]. In Bobcat's upper-constraints, oslo.db 12.3.2 [1] is the supported version. This version corresponds in reality to the last version generated during 2023.1/antelope [3]. All the versions of oslo.db generated during Bobcat are for now ignored by our runtime. However all these generated versions are all listed in our technical documentation as supported by Bobcat.
In fact, the problem is that these oslo.db versions are all stuck in their upper-constraints upgrade, because some cross-jobs failed and so the upper-constraints update can't be made. These cross-job are owned by different services (heat, manila, masakari, etc). We update our technical documentation each time we produce a new version of a deliverable, so before upgrading the upper-constraints. This is why the listed versions diverge from the versions really supported at runtime.
We also face a similar issue with Castellan, but in the sake of clarity of description of this problem I'll focus on oslo.db's case during the rest of this thread.
From a quantitative point of view, we face this kind of problem, from a consecutive manner, since 2 series. It seems now that this becomes our daily life with each new series of openstack. . At this rate it is very likely that we will still be faced with this same problem during the next series.
Indeed, during antelope, the same issue was thrown but only within one deliverable [4][5][6]. With Bobcat this scenario reappears again but now within two deliverables. The more the changes made in libraries are important, the more we will face this kind of issues again, and as everybody knows our libraries are all based on external libraries who could evolve toward new major releases with breaking changes. That was the case oslo.db where our goal was to migrate toward sqlalchemy 2.x. Leading to stuck upper-constraints.
This problem could also impact all the downstream distros. Some distros already started facing issues [7] with oslo.db's case.
We can't exclude that a similar issue will start to appear soon within all the Openstack deliverables listed in upper-constraints. Oslo's case is the first fruit.
From a quality point of view, we also face a real issue. As customers can establish their choices and their decisions on our technical documentation, a divergence between officially supported versions and runtime supported versions can have huge impacts for them. Imagine they decide to install a specific series led by imposed requirements requested by a government, that can be really problematic. By reading our technical documentation and our release notes, they can think that we fulfill those prerequisites. This kind of government requirement often arrives. It can be requested for a vendor who wants to be allowed to sell to a government, or to be allowed to respect some specific IT laws in a given country.
This last point can completely undermine the quality of the work carried out upstream within the Openstack community.
So, now, we have to find the root causes of this problem.
In the current case, we would think that the root cause lives in the complexity of oslo.db migration, yet this is not the case. Even if this migration represents a major change in Openstack, it has been announced two year ago [8] - the equivalent of 4 series -, leaving a lot of time for every team to adopt the latest versions of oslo.db and sqlalchemy 2.x.
Stephen Finucane and Mike Bayer have spent a lot of time on this topic. Stephen even contributed well beyond the oslo world, by proposing several patches to migrate services [9]. Unfortunately a lot of these patches remain yet unmerged and unreviewed [10], which has led us to this situation.
This migration is therefore by no means the root cause of this problem.
The root cause of this problem lurks in the volume of maintenance of services. Indeed the main cause of this issue is that some services are not able to follow the cadence, and therefore they slow down libraries' evolutions and maintenance. Hence, their requirements cross job reflect this fact [11]. This lack of activity is often due to the lack of maintainers.
Fortunately Bobcat has been rescued by Stephen's recent fixes [12][13]. Stephen's elegant solution allowed us to solve failing cross jobs [14] and hence, allowed us to resync our technical documentation and our runtime.
However, we can't ignore that the lack of maintainers is a growing trend within Openstack. As evidenced by the constant decrease in the number of contributors from series to series [15][16][17][18]. This phenomenon therefore risks becoming more and more amplified.
So, we must provide a lasting response. A response more based on team process than on isolated human resources.
A first solution could be to modify our workflow a little. We could update our technical documentation by triggering a job with the upper-constraints update rather than with a new release patch. Hence, the documentation and the runtime will be well aligned. However, we should notice that not all deliverables are listed in upper-constraints, hence this is a partial solution that won't work for our services.
A second solution would be to monitor teams activity by monitoring the upper-constraints updates with failing cross-job. That would be a new task for the requirements team. The goal of this monitoring would be to inform the TC that some deliverables are not active enough.
This monitoring would be to analyze, at defined milestones, which upper-constraints update remains blocked for a while, and then look at the cross-job failing to see if it is due to a lack of activity from the service side. For example by analyzing if patches, like those proposed by Stephen on services, remain unmerged. Then the TC would be informed.
It would be a kind of signal addressed to the TC. Then the TC would be free to make a decision (abandoning this deliverable, removing cross-job, put-your-idea-here).
The requirements team already provides such great job and expertise. Without them we wouldn't have solved the oslo.db and castellan case in time. However, I think we lack of aTC involvement a little bit earlier in the series to avoid fire fighter moments. The monitoring would officialize problems with deliverables sooners in the life cycle and would trigger a TC involvement.
Here is the opportunity for us to act to better anticipate the growing phenomenon of lack of maintainers. Here is the opportunity for us to better anticipate our available human resources. Here is the opportunity for us to better handle this kind of incident in the future.
Thus, we could integrate substantive actions in terms of human resources management into the life cycle of Openstack.
It is time to manage this pain point, because in the long term, if nothing is done now, this problem will repeat itself again and again.
Concerning the feasibility of this solution, the release team already created some similar monitoring. This monitoring is made during each series at specific milestones.
The requirements team could trigger its monitoring at specific milestones targets, not too close to the series deadline. Hence we would be able to anticipate decisions.
The requirements team could inspire from the release management process [19] to create their own monitoring. We already own almost the things we need to create a new process dedicated to this monitoring.
Hence, this solution is feasible.
The usefulness of this solution is obvious. Indeed, thus the TC would have better governance monitoring. A monitoring not based on people elected as TC members but based on process and so transmissible from a college to another.
Therefore, three teams would then work together on the topic of decreasing activity inside teams.
From a global point of view, this will allow Openstack to more efficiently keep pace with the resources available from series to series.
I would now like to special thank Stephen for his investment throughout these two years dedicated to the oslo.db migration. I would especially like to congratulate Stephen for the quality of the work carried out. Stephen helped us to solve the problem in an elegant manner. Without his expertise, delivering Bobcat would have been really painful. However, we should not forget that Stephen remains a human resource of Openstack and we should not forget that his expertise could go away from Openstack one day or one other. Solving this type of problem cannot only rest on the shoulders of one person. Let's take collective initiatives now and put in place safeguards.
Thanks for your reading and thanks to all the people who helped with this topic and that I have not cited here.
I think other solutions surely coexist and I'll be happy to discuss this topic with you.
[1] https://opendev.org/openstack/requirements/src/branch/master/upper-constrain... [2] https://releases.openstack.org/bobcat/index.html#bobcat-oslo-db [3] https://opendev.org/openstack/releases/src/branch/master/deliverables/antelo... [4] https://review.opendev.org/c/openstack/requirements/+/873390 [5] https://review.opendev.org/c/openstack/requirements/+/878130 [6] https://opendev.org/openstack/oslo.log/compare/5.1.0...5.2.0 [7] https://lists.openstack.org/pipermail/openstack-discuss/2023-September/03510... [8] https://lists.openstack.org/pipermail/openstack-discuss/2021-August/024122.h... [9] https://review.opendev.org/q/topic:sqlalchemy-20 [10] https://review.opendev.org/q/topic:sqlalchemy-20+status:open [11] https://review.opendev.org/c/openstack/requirements/+/887261 [12] https://opendev.org/openstack/oslo.db/commit/115c3247b486c713176139422647144... [13] https://opendev.org/openstack/oslo.db/commit/4ee79141e601482fcde02f0cecfb561... [14] https://review.opendev.org/c/openstack/requirements/+/896053 [15] https://www.openstack.org/software/ussuri [16] https://www.openstack.org/software/victoria [17] https://www.openstack.org/software/xena [18] https://www.openstack.org/software/antelope/ [19] https://releases.openstack.org/reference/process.html#between-milestone-2-an...
-- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/
-- Hervé Beraud Senior Software Engineer at Red Hat irc: hberaud https://github.com/4383/