On Tue, Dec 22, 2020, at 9:14 AM, Jeremy Stanley wrote:
On 2020-12-22 11:33:28 -0300 (-0300), Sorin Sbarnea wrote:
Could we find a way to extend the ability to alter queue to a select group of non-zuul admins?
I personally find the need to ping zuul admins about queue alteration problematic for at least two reasons:
- increases load of zuul-admins, which likely have other more pressing (or interesting) issues to deal with - it depends directly on the availability of zuul admins, which is not really a 24x7 service, not even a 24x5.
If we would have a token that allow some of us to use the new zuul client to put some patches on top of the queue, we could likely avoid having to depend on other humans for unblocking some pipelines.
As these operations would be very easy to track, I doubt this would be abused.
Currently that is achievable only by admins with something like:
zuul promote --tenant openstack --pipeline gate --changes 123,1
We quite often also precede it with `zuul enqueue ...` to put the change into the gate pipeline, either because the changes in question have preexisting Verified -1/-2 due to unrelated failures, or no Verified vote because it hasn't completed check pipeline jobs.
What if we can also have some power users? aka queue owners/stewards? How hard it would be?
Currently we access the scheduler's RPC socket via sudo locally on the server (the CLI utility writes to a named pipe owned by the zuuld user). An alternative would be to set up authentication for the REST API, which the client supports but we haven't used in OpenDev yet.
Worth noting that the only scoping available to us in the current authenticate setup for Zuul is tenant scoping. This means any tokens issued to allow promote for tripleo would allow it for all openstack tenant projects. The token would also be able to perform autohold, enqueue/enqueue-ref, and dequeue/dequeue-ref in addition to promote. I don't think these permissions are currently fine grained enough to work in our current tenant setup. https://zuul-ci.org/docs/zuul/discussion/tenant-scoped-rest-api.html
However, I question whether it's worthwhile spending time engineering a two-tiered administrative solution for our scheduler. This comes up once or maybe twice a month, takes only a minute, and never really has immediate urgency (except for security fixes which are generally scheduled and coordinated with someone well in advance), so it's not a particular burden on the current sysadmin team and there's generally someone around within 24 hours or less to process a request of that nature. As previously discussed, if it were particularly urgent, there are other remedies available to core review teams already.
It is worth remembering that Zuul's original (and arguably primary) method of receiving instruction is via the code review systems it listens to. Zuul supports the reorganization of queues via code review system state changes as a result. They are clunky and expensive, but that reflects the cost on both sides of a promotion. Dumping all existing job states and test nodes on the zuul and nodepool side in order to create a new queue state is an expensive operation for Zuul too. I don't think it is necessarily a bug to bubble that pain up to the users. Promotions should be done infrequently when necessary to avoid this resource thrashing. Ideally, projects would instead prioritize work on the review side and ensure that things are only approved when they are expected to pass gating and merge.
-- Jeremy Stanley