[cinder][all] EOL EM branches
Hi, We had a discussion in the last cinder meeting regarding making EM branches EOL for the cinder project[1]. The discussion started because of the CVE fixes where we backported to active stable branches i.e. Yoga, Zed and 2023.1 but there were no backports to further EM stable branches like Xena, Wallaby ... all the way to Train. Cinder team doesn't see much merit in keeping these EM stable branches alive since there is rarely any activity that requires collaboration and if there is, it is usually by the core cinder team. Following are some of our reasons to EOL these branches: 1) We have less review bandwidth even for active stable branches (Yoga, Zed and 2023.1) 2) No one, apart from the project team, does backport of critical fixes implying that those branches aren't used much for collaboration 3) Will save gate resources for some periodic jobs and the patches proposed 4) Save project team's time to fix gate issues We, as the cinder team, have decided to EOL all the existing EM branches that go from Train to Xena. It was agreed upon by the cinder team and no objections were raised during the upstream cinder meeting. We would like to gather more feedback outside of the cinder team regarding if this affects other projects, deployers, operators, vendors etc. Please reply to this email with your concerns, if you have any, so we can discuss again and reconsider our decision. Else the week after the summit we will be moving forward with our current decision. If you will be at the Vancouver Summit, you can also give us feedback at the cinder Forum session at 11:40 am on Wednesday June 14. [1] https://meetings.opendev.org/irclogs/%23openstack-meeting-alt/%23openstack-m... Thanks Rajat Dhasmana
On 2023-06-06 20:01:45 +0530 (+0530), Rajat Dhasmana wrote: [...]
The discussion started because of the CVE fixes where we backported to active stable branches i.e. Yoga, Zed and 2023.1 but there were no backports to further EM stable branches like Xena, Wallaby ... all the way to Train. [...]
The idea behind EM branches was that downstream distributions who need to backport these changes for their own customers would push the patches to the upstream EM branches so that they didn't all have to redo the same work and could benefit from each other's knowledge and experience around backports. If that's not happening for critical security patches, then I agree that the goal of the EM model has failed. Taking OSSA-2023-003 (CVE-2023-2088) as the most recent example, the advisory and patches for maintained branches were published four weeks ago. Fixes for stable/xena were developed along with the other backports because the branch had only just transitioned to EM a week or two prior. Of the four deliverables which were patched as part of that advisory, only Nova provided a patch for anything older, and that was just to stable/wallaby (and possibly added as a mistake or due to miscommunication between the various groups involved). In the four weeks since, I haven't seen anyone outside the core review teams for Cinder, Glance or Nova supply changes for additional backports, even though I expect the downstream distributions patched versions contemporary with some branches still in EM. It could be that this vulnerability is a poor example, because a lot of deployments use RBD by default and it wasn't affected, but the situation with the two other advisories earlier in the year wasn't much different where backports were concerned.
1) We have less review bandwidth even for active stable branches (Yoga, Zed and 2023.1)
The intent for EM was that reviewing branches no longer under normal maintenance could be delegated to other members of the community. Of course, that was the idea with stable maintenance as well. Perhaps it would be more accurate to restate this as there are no volunteers to review the changes (it shouldn't be the core review team's obligation either way)?
2) No one, apart from the project team, does backport of critical fixes implying that those branches aren't used much for collaboration
This is the best rationale for scrapping the whole EM idea, in my opinion. There's a possibility that if the core reviewers weren't pushing backports someone else would have done so eventually, but I think we have plenty of evidence now to indicate that doesn't really happen in practice.
3) Will save gate resources for some periodic jobs and the patches proposed
EM branches don't have to run the same jobs as maintained stable branches (or really any jobs at all), but that does still at a minimum need the attention of someone interested in removing the unwanted jobs.
4) Save project team's time to fix gate issues [...]
Similar to the earlier points, the project team shouldn't feel obligated to fix testing issues on EM branches. If testing breaks, it should be up to the volunteers proposing and reviewing changes to do that. If they don't, nothing will merge. If they don't care enough to keep it possible to merge things, then sure that's basically back to item #1 again. -- Jeremy Stanley
---- On Tue, 06 Jun 2023 10:32:32 -0700 Jeremy Stanley wrote ---
On 2023-06-06 20:01:45 +0530 (+0530), Rajat Dhasmana wrote: [...]
The discussion started because of the CVE fixes where we backported to active stable branches i.e. Yoga, Zed and 2023.1 but there were no backports to further EM stable branches like Xena, Wallaby ... all the way to Train. [...]
The idea behind EM branches was that downstream distributions who need to backport these changes for their own customers would push the patches to the upstream EM branches so that they didn't all have to redo the same work and could benefit from each other's knowledge and experience around backports. If that's not happening for critical security patches, then I agree that the goal of the EM model has failed.
Taking OSSA-2023-003 (CVE-2023-2088) as the most recent example, the advisory and patches for maintained branches were published four weeks ago. Fixes for stable/xena were developed along with the other backports because the branch had only just transitioned to EM a week or two prior. Of the four deliverables which were patched as part of that advisory, only Nova provided a patch for anything older, and that was just to stable/wallaby (and possibly added as a mistake or due to miscommunication between the various groups involved).
In the four weeks since, I haven't seen anyone outside the core review teams for Cinder, Glance or Nova supply changes for additional backports, even though I expect the downstream distributions patched versions contemporary with some branches still in EM. It could be that this vulnerability is a poor example, because a lot of deployments use RBD by default and it wasn't affected, but the situation with the two other advisories earlier in the year wasn't much different where backports were concerned.
1) We have less review bandwidth even for active stable branches (Yoga, Zed and 2023.1)
The intent for EM was that reviewing branches no longer under normal maintenance could be delegated to other members of the community. Of course, that was the idea with stable maintenance as well. Perhaps it would be more accurate to restate this as there are no volunteers to review the changes (it shouldn't be the core review team's obligation either way)?
2) No one, apart from the project team, does backport of critical fixes implying that those branches aren't used much for collaboration
This is the best rationale for scrapping the whole EM idea, in my opinion. There's a possibility that if the core reviewers weren't pushing backports someone else would have done so eventually, but I think we have plenty of evidence now to indicate that doesn't really happen in practice.
3) Will save gate resources for some periodic jobs and the patches proposed
EM branches don't have to run the same jobs as maintained stable branches (or really any jobs at all), but that does still at a minimum need the attention of someone interested in removing the unwanted jobs.
4) Save project team's time to fix gate issues [...]
Similar to the earlier points, the project team shouldn't feel obligated to fix testing issues on EM branches. If testing breaks, it should be up to the volunteers proposing and reviewing changes to do that. If they don't, nothing will merge. If they don't care enough to keep it possible to merge things, then sure that's basically back to item #1 again.
This is true but I think this is the point where things are becoming difficult. Even we do not need to but we as community developers keep fixing the EM gate, at least I can tell from my QA experience for this. We should stop at some line but in reality, we end up doing it. IMO, we should do some policies and testing changes which can help to understand the EM clearly and spend only the required time on those. A few of the ideas are: 1. Reduce the number of EM count, currently we have 7 EM branches. we should make it a little less say 4 or 5 as max any time. And after the max count limit (say 4) we will make the last one EOL. Having 4 EM branches means 2 years of extended support which is good enough. 2. Completely remove all the integration test jobs even if they are passing at the time branch is moving to EM state. This way project team will get less frustrated by seeing the gate failure on backport. This will simply ask them to backport it and make it available for downstream consumers who will test it properly before use. Until we keep the integration testing because it was passing today make more work for future maintenance. -gmann
-- Jeremy Stanley
On 2023-06-06 12:17:23 -0700 (-0700), Ghanshyam Mann wrote: [...]
This is true but I think this is the point where things are becoming difficult. Even we do not need to but we as community developers keep fixing the EM gate, at least I can tell from my QA experience for this. We should stop at some line but in reality, we end up doing it. [...]
Maybe my verbosity made it unclear, so just in case, what I was trying to say is that I consider Extended Maintenance to be a failed experiment and agree we should be talking about either reverting to the prior process from before EM was a thing or finding an alternative process that doesn't have so many of the obvious shortcomings of EM. People said if we just stopped EOL'ing branches so soon they would show up and help make use of those branches. They didn't, and so the expected benefits never materialized. -- Jeremy Stanley
On Tue, 6 Jun 2023 at 14:52, Jeremy Stanley <fungi@yuggoth.org> wrote:
Maybe my verbosity made it unclear, so just in case, what I was trying to say is that I consider Extended Maintenance to be a failed experiment and agree we should be talking about either reverting to the prior process from before EM was a thing or finding an alternative process that doesn't have so many of the obvious shortcomings of EM.
People said if we just stopped EOL'ing branches so soon they would show up and help make use of those branches. They didn't, and so the expected benefits never materialized.
I agree, my main concern is that we do this well across the whole set of projects, not have a variety of projects doing different things. Yours Tony.
On Tue, 6 Jun 2023 at 15:02, Tony Breeds <tony@bakeyournoodle.com> wrote:
On Tue, 6 Jun 2023 at 14:52, Jeremy Stanley <fungi@yuggoth.org> wrote:
Maybe my verbosity made it unclear, so just in case, what I was trying to say is that I consider Extended Maintenance to be a failed experiment and agree we should be talking about either reverting to the prior process from before EM was a thing or finding an alternative process that doesn't have so many of the obvious shortcomings of EM.
People said if we just stopped EOL'ing branches so soon they would show up and help make use of those branches. They didn't, and so the expected benefits never materialized.
I agree, my main concern is that we do this well across the whole set of projects, not have a variety of projects doing different things.
There is space on the summit schedule to discuss this as a community. https://vancouver2023.openinfra.dev/a/schedule#title=How%20do%20we%20end%20the%20Extended%20Maintenance&view=calendar I'll create an etherpad ASAP. Yours Tony.
---- On Tue, 06 Jun 2023 12:48:43 -0700 Jeremy Stanley wrote ---
On 2023-06-06 12:17:23 -0700 (-0700), Ghanshyam Mann wrote: [...]
This is true but I think this is the point where things are becoming difficult. Even we do not need to but we as community developers keep fixing the EM gate, at least I can tell from my QA experience for this. We should stop at some line but in reality, we end up doing it. [...]
Maybe my verbosity made it unclear, so just in case, what I was trying to say is that I consider Extended Maintenance to be a failed experiment and agree we should be talking about either reverting to the prior process from before EM was a thing or finding an alternative process that doesn't have so many of the obvious shortcomings of EM.
People said if we just stopped EOL'ing branches so soon they would show up and help make use of those branches. They didn't, and so the expected benefits never materialized.
I agree. If I see the main overhead in EM maintenance is keeping testing green. it is not easy to keep 11 branches (including Em, supported stable and master) testing up to date. My point is if we remove all the integration testing (can keep pep8 and unit tests) at the time the branch move to EM will solve the problem that the upstream community faces to maintain EM branches. -gmann
-- Jeremy Stanley
On 2023-06-07 10:16:05 -0700 (-0700), Ghanshyam Mann wrote: [...]
I agree. If I see the main overhead in EM maintenance is keeping testing green. it is not easy to keep 11 branches (including Em, supported stable and master) testing up to date. My point is if we remove all the integration testing (can keep pep8 and unit tests) at the time the branch move to EM will solve the problem that the upstream community faces to maintain EM branches.
The main counterargument I've heard repeated against this approach is that the project teams feel responsible for the quality of changes merging to those branches, and don't believe that lightly tested or nearly untested backports (from an integration perspective) adequately represent their quality standards. They'd rather close out the branches completely than have to explain that they contain basically untested backports (communication that they further fear will fall on deaf ears, leaving users angry or hurt when they discover it for themselves the hard way). -- Jeremy Stanley
On Wed, Jun 7, 2023 at 10:23 AM Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
---- On Tue, 06 Jun 2023 12:48:43 -0700 Jeremy Stanley wrote ---
On 2023-06-06 12:17:23 -0700 (-0700), Ghanshyam Mann wrote: [...]
This is true but I think this is the point where things are becoming difficult. Even we do not need to but we as community developers keep fixing the EM gate, at least I can tell from my QA experience for this. We should stop at some line but in reality, we end up doing it. [...]
Maybe my verbosity made it unclear, so just in case, what I was trying to say is that I consider Extended Maintenance to be a failed experiment and agree we should be talking about either reverting to the prior process from before EM was a thing or finding an alternative process that doesn't have so many of the obvious shortcomings of EM.
People said if we just stopped EOL'ing branches so soon they would show up and help make use of those branches. They didn't, and so the expected benefits never materialized.
I agree. If I see the main overhead in EM maintenance is keeping testing green. it is not easy to keep 11 branches (including Em, supported stable and master) testing up to date. My point is if we remove all the integration testing (can keep pep8 and unit tests) at the time the branch move to EM will solve the problem that the upstream community faces to maintain EM branches.
This, IMO, is akin to retiring the branches. How could I, as a developer, patch an older version of a branch against a vulnerability of the style of the recent Cinder one, where the impact is felt cross-project, and you clearly need a working dev environment (such as devstack). If, as you propose, we stopped doing any integration testing on branches older than 18 months, we would be de-facto retiring the integration testing infrastructure, which shares a huge amount of DNA with our dev tooling infrastructure. I don't know what the answer is; but this as a middle ground seems like the worst of all worlds: the branches still exist, and we will not have the tools to (manually, not just CI) test meaningful changes on them. Just a thought! - Jay Faulkner Ironic PTL
---- On Wed, 07 Jun 2023 10:38:03 -0700 Jay Faulkner wrote ---
On Wed, Jun 7, 2023 at 10:23 AM Ghanshyam Mann gmann@ghanshyammann.com> wrote:
---- On Tue, 06 Jun 2023 12:48:43 -0700 Jeremy Stanley wrote --- > On 2023-06-06 12:17:23 -0700 (-0700), Ghanshyam Mann wrote: > [...] > > This is true but I think this is the point where things are > > becoming difficult. Even we do not need to but we as community > > developers keep fixing the EM gate, at least I can tell from my QA > > experience for this. We should stop at some line but in reality, > > we end up doing it. > [...] > > Maybe my verbosity made it unclear, so just in case, what I was > trying to say is that I consider Extended Maintenance to be a failed > experiment and agree we should be talking about either reverting to > the prior process from before EM was a thing or finding an > alternative process that doesn't have so many of the obvious > shortcomings of EM. > > People said if we just stopped EOL'ing branches so soon they would > show up and help make use of those branches. They didn't, and so the > expected benefits never materialized.
I agree. If I see the main overhead in EM maintenance is keeping testing green. it is not easy to keep 11 branches (including Em, supported stable and master) testing up to date. My point is if we remove all the integration testing (can keep pep8 and unit tests) at the time the branch move to EM will solve the problem that the upstream community faces to maintain EM branches.
This, IMO, is akin to retiring the branches. How could I, as a developer, patch an older version of a branch against a vulnerability of the style of the recent Cinder one, where the impact is felt cross-project, and you clearly need a working dev environment (such as devstack). If, as you propose, we stopped doing any integration testing on branches older than 18 months, we would be de-facto retiring the integration testing infrastructure, which shares a huge amount of DNA with our dev tooling infrastructure.
It is not the same as retiring but if we see it can still run unit/functional tests and changes have been tested till supported and stable so we did testing of those fixes at some level. And there cannot be the case where I apply the fix directly to the EM branch. In our current doc also, we have the minimum testing expectation and I am just saying to reduce the testing at the time branch moved to EM instead of waiting for the gate to break and getting frustrated while backporting. EM as we meant since starting it not upstream maintained/guaranteed things so leaving testing expectation at downstream is no bug change than what current policy is. -gmann
I don't know what the answer is; but this as a middle ground seems like the worst of all worlds: the branches still exist, and we will not have the tools to (manually, not just CI) test meaningful changes on them. Just a thought! -Jay FaulknerIronic PTL
On 6/7/23 1:46 PM, Ghanshyam Mann wrote:
---- On Wed, 07 Jun 2023 10:38:03 -0700 Jay Faulkner wrote ---
On Wed, Jun 7, 2023 at 10:23 AM Ghanshyam Mann gmann@ghanshyammann.com> wrote:
---- On Tue, 06 Jun 2023 12:48:43 -0700 Jeremy Stanley wrote --- > On 2023-06-06 12:17:23 -0700 (-0700), Ghanshyam Mann wrote:
[snip]
I agree. If I see the main overhead in EM maintenance is keeping testing green. it is not easy to keep 11 branches (including Em, supported stable and master) testing up to date. My point is if we remove all the integration testing (can keep pep8 and unit tests) at the time the branch move to EM will solve the problem that the upstream community faces to maintain EM branches.
This, IMO, is akin to retiring the branches. How could I, as a developer, patch an older version of a branch against a vulnerability of the style of the recent Cinder one, where the impact is felt cross-project, and you clearly need a working dev environment (such as devstack). If, as you propose, we stopped doing any integration testing on branches older than 18 months, we would be de-facto retiring the integration testing infrastructure, which shares a huge amount of DNA with our dev tooling infrastructure.
It is not the same as retiring but if we see it can still run unit/functional tests and changes have been tested till supported and stable so we did testing of those fixes at some level. And there cannot be the case where I apply the fix directly to the EM branch.
I agree with Jay on this. IMO, what keeps devstack functional in the EM branches is that it's needed to run tempest tests. If we rely on unit/functional tests only, that motivation goes away. Further, as Jay points out, a working OpenStack deployment requires a harmonization of multiple components beyond the individual projects' unit/functional tests. For example, this (invalid) bug: https://bugs.launchpad.net/cinder/+bug/2020382 was reported after backporting a patch that had gone through the normal backport process upstream from master through stable/xena without skipping any branches; the xena patch applied cleanly and I'm pretty sure it passed unit and functional tests (I didn't run them myself). The issue did not occur until the code was actually used by cinder interacting with a real nova. So relying on unit and functional tests only is not adequate. When I approve a backport, I'm supposed to be fairly confident that the change is low-risk and will not cause regressions. Clean tempest jobs give me some useful evidence when making that assessment. A patch that passes CI is not guaranteed to be OK, but if it causes a CI failure, we know it's not OK.
In our current doc also, we have the minimum testing expectation and I am just saying to reduce the testing at the time branch moved to EM instead of waiting for the gate to break and getting frustrated while backporting.
EM as we meant since starting it not upstream maintained/guaranteed things so leaving testing expectation at downstream is no bug change than what current policy is.
That's correct, but as I think has been mentioned elsewhere in this thread, this has not proved to be workable. The stable cores on the project teams take their work seriously, and even though the docs say that EM branches should be treated caveat emptor, we still feel that our approval should mean something. So even though the docs say there's no guarantee on EM branches, nobody wants to have their name show up as approving a patch that caused a regression, even in an EM branch. Further (and I don't think I'm speaking only for myself here), I don't like the idea of other people merging unvetted stuff into our codebase. But that hasn't become an issue, because as Jeremy pointed out earlier, no one outside the project teams has showed up to take ownership of EM branches. Though (to get back to my real point here) if such people did show up, I would expect them to also maintain the full CI, including tempest integration tests, for the reasons I mentioned earlier. So I'm against the idea unit/functional testing is adequate for EM branches.
-gmann
(By the way, I am not implying that gmann is in favor of poor QA. He's articulated clearly what the current docs say about EM branches. But he's also been heroically responsible for keeping a lot of the EM integration gates functional!)
I don't know what the answer is; but this as a middle ground seems like the worst of all worlds: the branches still exist, and we will not have the tools to (manually, not just CI) test meaningful changes on them. Just a thought! -Jay FaulknerIronic PTL
---- On Thu, 08 Jun 2023 06:39:28 -0700 Brian Rosmaita wrote ---
On 6/7/23 1:46 PM, Ghanshyam Mann wrote:
---- On Wed, 07 Jun 2023 10:38:03 -0700 Jay Faulkner wrote ---
On Wed, Jun 7, 2023 at 10:23 AM Ghanshyam Mann gmann@ghanshyammann.com> wrote:
---- On Tue, 06 Jun 2023 12:48:43 -0700 Jeremy Stanley wrote --- > On 2023-06-06 12:17:23 -0700 (-0700), Ghanshyam Mann wrote:
[snip]
I agree. If I see the main overhead in EM maintenance is keeping testing green. it is not easy to keep 11 branches (including Em, supported stable and master) testing up to date. My point is if we remove all the integration testing (can keep pep8 and unit tests) at the time the branch move to EM will solve the problem that the upstream community faces to maintain EM branches.
This, IMO, is akin to retiring the branches. How could I, as a developer, patch an older version of a branch against a vulnerability of the style of the recent Cinder one, where the impact is felt cross-project, and you clearly need a working dev environment (such as devstack). If, as you propose, we stopped doing any integration testing on branches older than 18 months, we would be de-facto retiring the integration testing infrastructure, which shares a huge amount of DNA with our dev tooling infrastructure.
It is not the same as retiring but if we see it can still run unit/functional tests and changes have been tested till supported and stable so we did testing of those fixes at some level. And there cannot be the case where I apply the fix directly to the EM branch.
I agree with Jay on this. IMO, what keeps devstack functional in the EM branches is that it's needed to run tempest tests. If we rely on unit/functional tests only, that motivation goes away.
Further, as Jay points out, a working OpenStack deployment requires a harmonization of multiple components beyond the individual projects' unit/functional tests. For example, this (invalid) bug: https://bugs.launchpad.net/cinder/+bug/2020382 was reported after backporting a patch that had gone through the normal backport process upstream from master through stable/xena without skipping any branches; the xena patch applied cleanly and I'm pretty sure it passed unit and functional tests (I didn't run them myself). The issue did not occur until the code was actually used by cinder interacting with a real nova.
So relying on unit and functional tests only is not adequate. When I approve a backport, I'm supposed to be fairly confident that the change is low-risk and will not cause regressions. Clean tempest jobs give me some useful evidence when making that assessment. A patch that passes CI is not guaranteed to be OK, but if it causes a CI failure, we know it's not OK.
In our current doc also, we have the minimum testing expectation and I am just saying to reduce the testing at the time branch moved to EM instead of waiting for the gate to break and getting frustrated while backporting.
EM as we meant since starting it not upstream maintained/guaranteed things so leaving testing expectation at downstream is no bug change than what current policy is.
That's correct, but as I think has been mentioned elsewhere in this thread, this has not proved to be workable. The stable cores on the project teams take their work seriously, and even though the docs say that EM branches should be treated caveat emptor, we still feel that our approval should mean something. So even though the docs say there's no guarantee on EM branches, nobody wants to have their name show up as approving a patch that caused a regression, even in an EM branch.
Further (and I don't think I'm speaking only for myself here), I don't like the idea of other people merging unvetted stuff into our codebase. But that hasn't become an issue, because as Jeremy pointed out earlier, no one outside the project teams has showed up to take ownership of EM branches. Though (to get back to my real point here) if such people did show up, I would expect them to also maintain the full CI, including tempest integration tests, for the reasons I mentioned earlier. So I'm against the idea unit/functional testing is adequate for EM branches.
I do not disagree with Jay and you on more and more testing, but I am saying reducing testing (which is what the original idea was) is one of the tradeoffs between keeping extended branches available for fixes for a long time and upstream maintenance costs. We are clearly at the stage where the upstream community cannot maintain them with proper testing. Either we have to remove the idea of EM or try any new idea that can add more cost in upstream maintenance. I still do not find it very odd that we do not guarantee the EM backport fixes testing but at the same time make sure they are tested all the way from master to supported stable branches backporting. Leave the complete testing to the downstream consumers to test properly before applying the fixes.
-gmann
(By the way, I am not implying that gmann is in favor of poor QA. He's articulated clearly what the current docs say about EM branches. But he's also been heroically responsible for keeping a lot of the EM integration gates functional!)
Apart from maintaining, pinning tempest/plugins version also takes a lot of time. Now I am starting the pinning tempest/plugins for recent EM stable/xena and it requires a large amount of time to test/pin the compatible version of tempest and plugins on stable/xena. -gmann
I don't know what the answer is; but this as a middle ground seems like the worst of all worlds: the branches still exist, and we will not have the tools to (manually, not just CI) test meaningful changes on them. Just a thought! -Jay FaulknerIronic PTL
Hi, Thanks for starting this thread. As a stable maintainer let me also share my thoughts: - It is really sad to see, that important CVE fixes haven't arrived to old stable branches, that is clearly a sign that stable maintenance is not in a good shape on EM branches - I also see that stable maintenance is not in a good shape on 'maintained' branches either (2023.1, zed and yoga) for most of the projects, so that is also a visible problem - so, I understand that teams want to focus more on their maintained branches Still, I don't feel that eliminating the 'failed experiment' with Extended Maintenance process would solve our issue. Though I agree that, if we now EOL all our EM branches, that would really call some attention to some vendors, operators, etc. (Would it help? Would companies step up to spend more resources on upstream maintenance? Good question.) Now let me also share some general thoughts about the topics people brought up in this thread (please don't read those which does not interest you o:)): 1) who should maintain EM branches? Well, it was stressed out several times that maintainers can be different then the project core team. Though how I understood this is not like that there is a 'stable maintainer' group for 'maintained' branches and a completely different 'extended maintainer' group for branches in 'extended maintenance', rather that it's not necessary the *same* core team that develops the master branch. As a trivial way of working is that A maintainer from X company is maybe interested back to let's say Wallaby branch, so that is what branches they maintain, where they primarily backport bug fixes, review backports, fixes CI; B maintainer from Y company then does the same but till Xena branch; etc. (Of course, the best is if time to time they can help out each other). I know this is quite idealistic... I believe that maintainers have an *employer* with an interest to keep XY branch as maintained as possible and they act accordingly as much as possible (idealistic, too). 2) what tests to keep? I understand that keeping old CI jobs functional is cumbersome and resource needy. On the other hand, *every* vendor and companies benefit from this, as downstream CI usually not that good as the upstream one OR very expensive (resource, maintenance, fixing the uncaught bugs, etc). I also tend to agree the opinion that only keeping unit tests and functional tests doesn't give enough confidence in quality. Devstack based tempest tests need to be kept, though I agree that teams can somewhat reduce their job count, maybe with rationalising and dropping somewhat redundant, expensive, time and resource consuming test jobs (like different kind of grenade jobs, non-voting jobs, special case jobs, etc), at least when they start to fail and there's no one to fix them. 3) are EM branches 'fully maintained'? The original idea was that it isn't. EM just means that maintainers can propose backports and can review them. It doesn't mean that every bugfix will be backported and reviewed (unfortunately the same can be seen in 'maintained' branches as well). Though I also share the views of those, who say that the fact that important CVE fixes don't arrive to even "younger" EM branches is a signal that those branches are doomed and probably should be EOL'd. (Though some of you noted that the recent CVEs have cross project dependencies, thus not that trivial to backport them.) 4) real count of EM branches According to releases.openstack.org, it's 7, yes. In reality it's rather 5 or less. For some projects it's even just 2 or 3. Because it can be different for every project. Yes, we could have EOL'd waaay earlier rocky (for every project) and stein branches (rocky is like 98% EOL'd, waiting for only some of the PTLs to approve their project's EOL transition patch; Stein is really rotten, but will be next as soon as rocky is done; but for example stable/train gate for Nova, Neutron, etc, is not blocked, tests can pass, even though there are not many patches that get merged). 5) is EM a 'failed experiment'? Somewhat yes, somewhat no. There have been many tested bug fixes that landed on those branches over the years, so companies could have benefit from them. But of course, it's still not equal to consider those branches as 'maintained', so yes that could rise some misconception. And as you also said, not getting CVE fixes landed shows that those branches are far from being 'maintained'. Anyway, personally, I would not end this 'experiment' (as I'm probably too optimistic :)), but I see that stable maintenance is a problem. I'm curious about where this thread will lead and happy to see that this got a forum topic on the Vancouver PTG schedule (thanks Tony! :)). Unfortunately, I cannot participate as I could not travel there, but I'm eagerly waiting for the thoughts (and maybe resolution) of the in-person discussion! Thanks for your time reading this long monologue o:) Cheers, Előd Illés irc: elodilles @ #openstack-stable / #openstack-release From: Ghanshyam Mann <gmann@ghanshyammann.com> Sent: Thursday, June 8, 2023 5:37 PM To: Brian Rosmaita <rosmaita.fossdev@gmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: [cinder][all][tc][ops][stable] EOL EM branches ---- On Thu, 08 Jun 2023 06:39:28 -0700 Brian Rosmaita wrote --- > On 6/7/23 1:46 PM, Ghanshyam Mann wrote: > > ---- On Wed, 07 Jun 2023 10:38:03 -0700 Jay Faulkner wrote --- > > > > > > On Wed, Jun 7, 2023 at 10:23 AM Ghanshyam Mann gmann@ghanshyammann.com> wrote: > > > > > > ---- On Tue, 06 Jun 2023 12:48:43 -0700 Jeremy Stanley wrote --- > > > > On 2023-06-06 12:17:23 -0700 (-0700), Ghanshyam Mann wrote: > [snip] > > > I agree. If I see the main overhead in EM maintenance is keeping testing green. > > > it is not easy to keep 11 branches (including Em, supported stable and master) > > > testing up to date. My point is if we remove all the integration testing (can keep pep8 > > > and unit tests) at the time the branch move to EM will solve the problem that the upstream > > > community faces to maintain EM branches. > > > > > > > > > This, IMO, is akin to retiring the branches. How could I, as a developer, patch an older version of a branch against a vulnerability of the style of the recent Cinder one, where the impact is felt cross-project, and you clearly need a working dev environment (such as devstack). > > > If, as you propose, we stopped doing any integration testing on branches older than 18 months, we would be de-facto retiring the integration testing infrastructure, which shares a huge amount of DNA with our dev tooling infrastructure. > > > > It is not the same as retiring but if we see it can still run unit/functional tests and changes have been > > tested till supported and stable so we did testing of those fixes at some level. And there cannot be the > > case where I apply the fix directly to the EM branch. > > I agree with Jay on this. IMO, what keeps devstack functional in the EM > branches is that it's needed to run tempest tests. If we rely on > unit/functional tests only, that motivation goes away. > > Further, as Jay points out, a working OpenStack deployment requires a > harmonization of multiple components beyond the individual projects' > unit/functional tests. For example, this (invalid) bug: > https://bugs.launchpad.net/cinder/+bug/2020382 > was reported after backporting a patch that had gone through the normal > backport process upstream from master through stable/xena without > skipping any branches; the xena patch applied cleanly and I'm pretty > sure it passed unit and functional tests (I didn't run them myself). > The issue did not occur until the code was actually used by cinder > interacting with a real nova. > > So relying on unit and functional tests only is not adequate. When I > approve a backport, I'm supposed to be fairly confident that the change > is low-risk and will not cause regressions. Clean tempest jobs give me > some useful evidence when making that assessment. A patch that passes > CI is not guaranteed to be OK, but if it causes a CI failure, we know > it's not OK. > > > In our current doc also, we have the minimum testing expectation and I am just saying to reduce the testing > > at the time branch moved to EM instead of waiting for the gate to break and getting frustrated while backporting. > > > > EM as we meant since starting it not upstream maintained/guaranteed things so leaving testing expectation at > > downstream is no bug change than what current policy is. > > That's correct, but as I think has been mentioned elsewhere in this > thread, this has not proved to be workable. The stable cores on the > project teams take their work seriously, and even though the docs say > that EM branches should be treated caveat emptor, we still feel that our > approval should mean something. So even though the docs say there's no > guarantee on EM branches, nobody wants to have their name show up as > approving a patch that caused a regression, even in an EM branch. > > Further (and I don't think I'm speaking only for myself here), I don't > like the idea of other people merging unvetted stuff into our codebase. > But that hasn't become an issue, because as Jeremy pointed out earlier, > no one outside the project teams has showed up to take ownership of EM > branches. Though (to get back to my real point here) if such people did > show up, I would expect them to also maintain the full CI, including > tempest integration tests, for the reasons I mentioned earlier. So I'm > against the idea unit/functional testing is adequate for EM branches. I do not disagree with Jay and you on more and more testing, but I am saying reducing testing (which is what the original idea was) is one of the tradeoffs between keeping extended branches available for fixes for a long time and upstream maintenance costs. We are clearly at the stage where the upstream community cannot maintain them with proper testing. Either we have to remove the idea of EM or try any new idea that can add more cost in upstream maintenance. I still do not find it very odd that we do not guarantee the EM backport fixes testing but at the same time make sure they are tested all the way from master to supported stable branches backporting. Leave the complete testing to the downstream consumers to test properly before applying the fixes. > > > > > -gmann > > (By the way, I am not implying that gmann is in favor of poor QA. He's > articulated clearly what the current docs say about EM branches. But > he's also been heroically responsible for keeping a lot of the EM > integration gates functional!) Apart from maintaining, pinning tempest/plugins version also takes a lot of time. Now I am starting the pinning tempest/plugins for recent EM stable/xena and it requires a large amount of time to test/pin the compatible version of tempest and plugins on stable/xena. -gmann > > > > > > > I don't know what the answer is; but this as a middle ground seems like the worst of all worlds: the branches still exist, and we will not have the tools to (manually, not just CI) test meaningful changes on them. > > > Just a thought! > > > -Jay FaulknerIronic PTL > > > > >
On Tue, 6 Jun 2023 at 09:36, Rajat Dhasmana <rdhasman@redhat.com> wrote:
Hi,
We had a discussion in the last cinder meeting regarding making EM branches EOL for the cinder project[1]. The discussion started because of the CVE fixes where we backported to active stable branches i.e. Yoga, Zed and 2023.1 but there were no backports to further EM stable branches like Xena, Wallaby ... all the way to Train.
That is expected/totally fine with EM branches
Cinder team doesn't see much merit in keeping these EM stable branches alive since there is rarely any activity that requires collaboration and if there is, it is usually by the core cinder team. Following are some of our reasons to EOL these branches:
This is less related to EM and more a result of make-up of many project teams.
1) We have less review bandwidth even for active stable branches (Yoga, Zed and 2023.1)
These are the only branches the cinder team is expected to be reviewing.
2) No one, apart from the project team, does backport of critical fixes implying that those branches aren't used much for collaboration
I understand your point but by removing them there is zero scope for collaboration. When the existing stable policy was discussed/created in Sydney it was observed that there is basically 0 overlap with branches that vendors pick to support for a longer time. It seems that in the current climate this looks even worse. :(
3) Will save gate resources for some periodic jobs and the patches proposed
4) Save project team's time to fix gate issues
This is a valid point, do you have a feel for how much time the CInder team spent fixing issues on EM branches?
We, as the cinder team, have decided to EOL all the existing EM branches that go from Train to Xena. It was agreed upon by the cinder team and no objections were raised during the upstream cinder meeting.
This will be somewhat impactful on projects that wish to keep CI running for EM branches, devstack will need to know where/how to get cinder projects code, and would make it nearly impossible to projects that overlap with the cinder team (nova via os-brick) to maintain any critical updates on their non EM branches.
We would like to gather more feedback outside of the cinder team regarding if this affects other projects, deployers, operators, vendors etc. Please reply to this email with your concerns, if you have any, so we can discuss again and reconsider our decision. Else the week after the summit we will be moving forward with our current decision. If you will be at the Vancouver Summit, you can also give us feedback at the cinder Forum session at 11:40 am on Wednesday June 14.
For sure.
[1] https://meetings.opendev.org/irclogs/%23openstack-meeting-alt/%23openstack-m...
Thanks Rajat Dhasmana
-- Yours Tony.
1) We have less review bandwidth even for active stable branches (Yoga, Zed and 2023.1) 2) No one, apart from the project team, does backport of critical fixes implying that those branches aren't used much for collaboration 3) Will save gate resources for some periodic jobs and the patches proposed 4) Save project team's time to fix gate issues
These are all good reasons, and I think that they highlight how the original plan for EM has turned out to have failed in practice. I think in most cases, it's the project teams that continue maintaining these, backporting patches here, and fixing zuul config issues or other gate fails when they happen. I myself tried to make the point recently that we should be dropping (not fixing) the ceph job on the wallaby gate when it broke (per the plan), but the well-meaning people involved ended up fixing it anyway. I think we're probably due to revisit the current EM strategy soon. The recent CVE is the most important thing to me though. Anyone that looks at the recent activity in say, wallaby will see a lot of familiar faces and recent backports from the project teams. It would not be a stretch at all to assume that since we've backported minor fixes that we've also already backported the most substantial CVE in the last decade as well -- but we haven't (and won't). Nova is in a similar boat, with the last *two* CVEs unfixed in the earlier branches because of the complexity of the multiple projects, libraries, releases, and tests that need to be coordinated in order to have the desired effect. IMHO, it is the prudent and responsible thing to do to drop these branches which look maintained but in reality have known severe vulnerabilities in them.
We, as the cinder team, have decided to EOL all the existing EM branches that go from Train to Xena. It was agreed upon by the cinder team and no objections were raised during the upstream cinder meeting.
Given the severity of the impact to older unpatched Cinder branches, think this makes sense. Nova isn't in quite the same boat, but I made the same arguments in this patch proposing to EOL train, where the second-to-last VMDK-related vulnerability remains unfixed: https://review.opendev.org/c/openstack/releases/+/885365 --Dan
participants (8)
-
Brian Rosmaita
-
Dan Smith
-
Elõd Illés
-
Ghanshyam Mann
-
Jay Faulkner
-
Jeremy Stanley
-
Rajat Dhasmana
-
Tony Breeds