On Mon, Jul 27, 2020 at 5:27 PM Wesley Hayutin <whayutin@redhat.com> wrote:
FYI...

If you find your jobs are failing with an error similar to [1], you have been rate limited by docker.io via the upstream mirror system and have hit [2].  I've been discussing the issue w/ upstream infra, rdo-infra and a few CI engineers. 

There are a few ways to mitigate the issue however I don't see any of the options being completed very quickly so I'm asking for your patience while this issue is socialized and resolved.

For full transparency we're considering the following options.

1. move off of docker.io to quay.io

quay.io also has API rate limit:
https://docs.quay.io/issues/429.html

Now I'm not sure about how many requests per seconds one can do vs the other but this would need to be checked with the quay team before changing anything.
Also quay.io had its big downtimes as well, SLA needs to be considered.

2. local container builds for each job in master, possibly ussuri

Not convinced.
You can look at CI logs:
- pulling / updating / pushing container images from docker.io to local registry takes ~10 min on standalone (OVH)
- building containers from scratch with updated repos and pushing them to local registry takes ~29 min on standalone (OVH).
 
3. parent child jobs upstream where rpms and containers will be build and host artifacts for the child jobs

Yes, we need to investigate that.
 
4. remove some portion of the upstream jobs to lower the impact we have on 3rd party infrastructure.

I'm not sure I understand this one, maybe you can give an example of what could be removed?


If you have thoughts please don't hesitate to share on this thread.  Very sorry we're hitting these failures and I really appreciate your patience.  I would expect major delays in getting patches merged at this point until things are resolved.

Thank you!

[1] HTTPError: 429 Client Error: Too Many Requests for url: http://mirror.ca-ymq-1.vexxhost.opendev.org:8082/v2/tripleotrain/centos-binary-cron/blobs/sha256:76342b0db11c6b5acf33b9f1cbf10b3d2680fb20967ccd7daa9593a39e9e45c0
[2] https://bugs.launchpad.net/tripleo/+bug/1889122


--
Emilien Macchi