On Tue, Jun 04, 2019 at 05:32:41PM +0000, Jeremy Stanley wrote:
On 2019-06-04 17:23:46 +0100 (+0100), Graham Hayes wrote: [...]
I have been trying to limit this behaviour for nearly 4 years [3] (it can actually add 10-15 mins sometimes depending on what source trees I have mounted via NFS into a devstack VM when doing dev)
Similar I suppose, though the problem mentioned in this subthread is actually not about the mass permission change itself, rather about the resulting permissions. In particular the fetch-zuul-cloner role makes the entire set of provided repositories world-writeable because the zuul-cloner v2 compatibility shim performs clones from those file paths and Git wants to hardlink them if they're being cloned within the same filesystem. This is necessary to support occasions where the original copies aren't owned by the same user running the zuul-cloner shim, since you can't hardlink files for which your account lacks write access.
I've done a bit of digging into the history of this now, so the following is probably boring to the majority of you. If you want to help figure out why it's still there at the moment and what's left to do, read on...
Change https://review.openstack.org/512285 which added the chmod task includes a rather prescient comment from Paul about not adding it to the mirror-workspace-git-repos role because "we might not want to chmod 777 on no-legacy jobs." Unfortunately I think we failed to realize that it already would because we had added fetch-zuul-cloner to our base job a month earlier in https://review.openstack.org/501843 for reasons which are not recorded in the change (presumably a pragmatic compromise related to the scramble to convert our v2 jobs at the time, I did not resort to digging in IRC history just yet). Soon after, we added fetch-zuul-cloner to the main "legacy" pre playbook with https://review.opendev.org/513067 and prepared to test its removal from the base job with https://review.opendev.org/513079 but that was never completed and I can't seem to find the results of the testing (or even any indication it was ever actually performed).
Testing was done, you can see that in https://review.opendev.org/513506/. However the issue was, at the time, projects that were using tools/tox_install.sh would break (I have no idea is that is still the case). For humans interested, https://etherpad.openstack.org/p/zuulv3-remove-zuul-cloner was the etherpad to capture this work. Eventually I ended up abandoning the patch, because I wasn't able to keep pushing on it.
At this point, I feel like we probably just need to re-propose an equivalent of 513079 in our base-jobs repository, exercise it with some DNM changes running a mix of legacy imported v2 and modern v3 native jobs, announce a flag day for the cut over, and try to help address whatever fallout we're unable to predict ahead of time. This is somewhat complicated by the need to also do something similar in https://review.opendev.org/656195 with the bindep "fallback" packages list, so we're going to need to decide how those two efforts will be sequenced, or whether we want to combine them into a single (and likely doubly-painful) event. -- Jeremy Stanley