[infra][tripleo] is it possible to control (tripleo) gate queue priority?
Hello illustrious members of openstack infra tripleo-ci squad is wondering if it is possible for (some subset of) us to be able to set the priority of a particular patch/es in the tripleo queue. We've done this "manually" in the past, by abandoning all patches in the gate & then restoring in order and putting the priority patch at the top of the dependency queue. However abandoning all the things is completely disruptive for everyone else (sometimes that might be necessary if your queue is way too long but still...). So the question is, is there a better way to put a particular patch at the top of our queue when we need to do that? thanks for your thoughts, sorry if this has come up before I couldn't quickly find something in the list archives. regards, marios
On 2020-12-17 16:43:08 +0200 (+0200), Marios Andreou wrote:
tripleo-ci squad is wondering if it is possible for (some subset of) us to be able to set the priority of a particular patch/es in the tripleo queue.
Not directly, no, it's an administrative function of the Zuul scheduler which can't be delegated by queue.
We've done this "manually" in the past, by abandoning all patches in the gate & then restoring in order and putting the priority patch at the top of the dependency queue. However abandoning all the things is completely disruptive for everyone else (sometimes that might be necessary if your queue is way too long but still...).
It's actually not as terrible a solution as it sounds, you're basically signalling to your contributors that your jobs are unhealthy and your immediate priority is to focus on merging identified fixes for that problem rather than other patches. It also frees up our CI resources which you would otherwise be monopolizing due to churn from repeated gate resets of massively long change queues, ultimately helping those fixes merge more quickly. Of course it also depends on your core review teams getting on the same page and not continuing to approve unrelated changes which are unlikely to merge at that point, but this is more of a social issue and not a technical one.
So the question is, is there a better way to put a particular patch at the top of our queue when we need to do that? [...]
OpenDev's Zuul administrators have access to reorder queues in dependent pipelines. Reach out to us through the OpenStack TaCT SIG's #openstack-infra IRC channel on Freenode or here on openstack-discuss with the [infra] subject tag, explaining which approved changes you need moved to the front and why. Ideally coordinate this with the rest of your team, since we don't want to wind up in the middle of a team squabble where different contributors are asking to have their changes prioritized at odds with one another. To avoid confusion, we typically want to at least see some acknowledgement of the request from your PTL or designated Infra Liaison[*]. [*] https://wiki.openstack.org/wiki/CrossProjectLiaisons#Infra -- Jeremy Stanley
On Thu, Dec 17, 2020 at 6:46 PM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2020-12-17 16:43:08 +0200 (+0200), Marios Andreou wrote:
tripleo-ci squad is wondering if it is possible for (some subset of) us to be able to set the priority of a particular patch/es in the tripleo queue.
Not directly, no, it's an administrative function of the Zuul scheduler which can't be delegated by queue.
ack, we suspected that permissions might be an issue (i.e. that we cannot be given administrative access for 'just the tripleo queue' is at least one of the obstacles here ;) ).
We've done this "manually" in the past, by abandoning all patches in the gate & then restoring in order and putting the priority patch at the top of the dependency queue. However abandoning all the things is completely disruptive for everyone else (sometimes that might be necessary if your queue is way too long but still...).
It's actually not as terrible a solution as it sounds, you're basically signalling to your contributors that your jobs are unhealthy and your immediate priority is to focus on merging identified fixes for that problem rather than other patches. It also frees up our CI resources which you would otherwise be monopolizing due to churn from repeated gate resets of massively long change queues, ultimately helping those fixes merge more quickly. Of course it also depends on your core review teams getting on the same page and not continuing to approve unrelated changes which are unlikely to merge at that point, but this is more of a social issue and not a technical one.
Indeed this has been done in the past and obviously signalled on the mailing list so folks can stop approving patches (and it typically works out fine).
So the question is, is there a better way to put a particular patch at the top of our queue when we need to do that? [...]
OpenDev's Zuul administrators have access to reorder queues in dependent pipelines. Reach out to us through the OpenStack TaCT SIG's #openstack-infra IRC channel on Freenode or here on openstack-discuss with the [infra] subject tag, explaining which approved changes you need moved to the front and why. Ideally coordinate this with the rest of your team, since we don't want to wind up in the middle of a team squabble where different contributors are asking to have their changes prioritized at odds with one another. To avoid confusion, we typically want to at least see some acknowledgement of the request from your PTL or designated Infra Liaison[*].
[*] https://wiki.openstack.org/wiki/CrossProjectLiaisons#Infra
ACK thanks this is good to know. This topic came up in our team discussions recently and we felt it was at least worth asking if there was another way to manipulate the queue ourselves that didn't involve abandoning all the things. Thank you very much for taking the time to reply marios
-- Jeremy Stanley
Could we find a way to extend the ability to alter queue to a select group of non-zuul admins? I personally find the need to ping zuul admins about queue alteration problematic for at least two reasons: - increases load of zuul-admins, which likely have other more pressing (or interesting) issues to deal with - it depends directly on the availability of zuul admins, which is not really a 24x7 service, not even a 24x5. If we would have a token that allow some of us to use the new zuul client to put some patches on top of the queue, we could likely avoid having to depend on other humans for unblocking some pipelines. As these operations would be very easy to track, I doubt this would be abused. Currently that is achievable only by admins with something like: zuul promote --tenant openstack --pipeline gate --changes 123,1 What if we can also have some power users? aka queue owners/stewards? How hard it would be? Thanks Sorin Sbarnea On 18 Dec 2020 at 07:06:19, Marios Andreou <marios@redhat.com> wrote:
On Thu, Dec 17, 2020 at 6:46 PM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2020-12-17 16:43:08 +0200 (+0200), Marios Andreou wrote:
tripleo-ci squad is wondering if it is possible for (some subset of) us to be able to set the priority of a particular patch/es in the tripleo queue.
Not directly, no, it's an administrative function of the Zuul scheduler which can't be delegated by queue.
ack, we suspected that permissions might be an issue (i.e. that we cannot be given administrative access for 'just the tripleo queue' is at least one of the obstacles here ;) ).
We've done this "manually" in the past, by abandoning all patches in the gate & then restoring in order and putting the priority patch at the top of the dependency queue. However abandoning all the things is completely disruptive for everyone else (sometimes that might be necessary if your queue is way too long but still...).
It's actually not as terrible a solution as it sounds, you're basically signalling to your contributors that your jobs are unhealthy and your immediate priority is to focus on merging identified fixes for that problem rather than other patches. It also frees up our CI resources which you would otherwise be monopolizing due to churn from repeated gate resets of massively long change queues, ultimately helping those fixes merge more quickly. Of course it also depends on your core review teams getting on the same page and not continuing to approve unrelated changes which are unlikely to merge at that point, but this is more of a social issue and not a technical one.
Indeed this has been done in the past and obviously signalled on the mailing list so folks can stop approving patches (and it typically works out fine).
So the question is, is there a better way to put a particular patch at the top of our queue when we need to do that? [...]
OpenDev's Zuul administrators have access to reorder queues in dependent pipelines. Reach out to us through the OpenStack TaCT SIG's #openstack-infra IRC channel on Freenode or here on openstack-discuss with the [infra] subject tag, explaining which approved changes you need moved to the front and why. Ideally coordinate this with the rest of your team, since we don't want to wind up in the middle of a team squabble where different contributors are asking to have their changes prioritized at odds with one another. To avoid confusion, we typically want to at least see some acknowledgement of the request from your PTL or designated Infra Liaison[*].
[*] https://wiki.openstack.org/wiki/CrossProjectLiaisons#Infra
ACK thanks this is good to know.
This topic came up in our team discussions recently and we felt it was at least worth asking if there was another way to manipulate the queue ourselves that didn't involve abandoning all the things.
Thank you very much for taking the time to reply
marios
-- Jeremy Stanley
On 2020-12-22 11:33:28 -0300 (-0300), Sorin Sbarnea wrote:
Could we find a way to extend the ability to alter queue to a select group of non-zuul admins?
I personally find the need to ping zuul admins about queue alteration problematic for at least two reasons:
- increases load of zuul-admins, which likely have other more pressing (or interesting) issues to deal with - it depends directly on the availability of zuul admins, which is not really a 24x7 service, not even a 24x5.
If we would have a token that allow some of us to use the new zuul client to put some patches on top of the queue, we could likely avoid having to depend on other humans for unblocking some pipelines.
As these operations would be very easy to track, I doubt this would be abused.
Currently that is achievable only by admins with something like:
zuul promote --tenant openstack --pipeline gate --changes 123,1
We quite often also precede it with `zuul enqueue ...` to put the change into the gate pipeline, either because the changes in question have preexisting Verified -1/-2 due to unrelated failures, or no Verified vote because it hasn't completed check pipeline jobs.
What if we can also have some power users? aka queue owners/stewards? How hard it would be?
Currently we access the scheduler's RPC socket via sudo locally on the server (the CLI utility writes to a named pipe owned by the zuuld user). An alternative would be to set up authentication for the REST API, which the client supports but we haven't used in OpenDev yet. However, I question whether it's worthwhile spending time engineering a two-tiered administrative solution for our scheduler. This comes up once or maybe twice a month, takes only a minute, and never really has immediate urgency (except for security fixes which are generally scheduled and coordinated with someone well in advance), so it's not a particular burden on the current sysadmin team and there's generally someone around within 24 hours or less to process a request of that nature. As previously discussed, if it were particularly urgent, there are other remedies available to core review teams already. -- Jeremy Stanley
On Tue, Dec 22, 2020, at 9:14 AM, Jeremy Stanley wrote:
On 2020-12-22 11:33:28 -0300 (-0300), Sorin Sbarnea wrote:
Could we find a way to extend the ability to alter queue to a select group of non-zuul admins?
I personally find the need to ping zuul admins about queue alteration problematic for at least two reasons:
- increases load of zuul-admins, which likely have other more pressing (or interesting) issues to deal with - it depends directly on the availability of zuul admins, which is not really a 24x7 service, not even a 24x5.
If we would have a token that allow some of us to use the new zuul client to put some patches on top of the queue, we could likely avoid having to depend on other humans for unblocking some pipelines.
As these operations would be very easy to track, I doubt this would be abused.
Currently that is achievable only by admins with something like:
zuul promote --tenant openstack --pipeline gate --changes 123,1
We quite often also precede it with `zuul enqueue ...` to put the change into the gate pipeline, either because the changes in question have preexisting Verified -1/-2 due to unrelated failures, or no Verified vote because it hasn't completed check pipeline jobs.
What if we can also have some power users? aka queue owners/stewards? How hard it would be?
Currently we access the scheduler's RPC socket via sudo locally on the server (the CLI utility writes to a named pipe owned by the zuuld user). An alternative would be to set up authentication for the REST API, which the client supports but we haven't used in OpenDev yet.
Worth noting that the only scoping available to us in the current authenticate setup for Zuul is tenant scoping. This means any tokens issued to allow promote for tripleo would allow it for all openstack tenant projects. The token would also be able to perform autohold, enqueue/enqueue-ref, and dequeue/dequeue-ref in addition to promote. I don't think these permissions are currently fine grained enough to work in our current tenant setup. https://zuul-ci.org/docs/zuul/discussion/tenant-scoped-rest-api.html
However, I question whether it's worthwhile spending time engineering a two-tiered administrative solution for our scheduler. This comes up once or maybe twice a month, takes only a minute, and never really has immediate urgency (except for security fixes which are generally scheduled and coordinated with someone well in advance), so it's not a particular burden on the current sysadmin team and there's generally someone around within 24 hours or less to process a request of that nature. As previously discussed, if it were particularly urgent, there are other remedies available to core review teams already.
It is worth remembering that Zuul's original (and arguably primary) method of receiving instruction is via the code review systems it listens to. Zuul supports the reorganization of queues via code review system state changes as a result. They are clunky and expensive, but that reflects the cost on both sides of a promotion. Dumping all existing job states and test nodes on the zuul and nodepool side in order to create a new queue state is an expensive operation for Zuul too. I don't think it is necessarily a bug to bubble that pain up to the users. Promotions should be done infrequently when necessary to avoid this resource thrashing. Ideally, projects would instead prioritize work on the review side and ensure that things are only approved when they are expected to pass gating and merge.
-- Jeremy Stanley
participants (4)
-
Clark Boylan
-
Jeremy Stanley
-
Marios Andreou
-
Sorin Sbarnea