On 2021-05-11 09:47:45 +0200 (+0200), Sylvain Bauza wrote: [...]
Could we be discussing how we could try to find a workaround for this? [...]
That's absolutely worth discussing, but it needs people committed to building and maintaining something. The current implementation was never efficient, and we realized that when we started trying to operate it at scale. It relies on massive quantities of donated infrastructure for which we're trying to be responsible stewards (just the Elasticsearch cluster alone consumes 6x the resources of of our Gerrit deployment). We get that it's a useful service, but we need to weigh the relative utility against the cost, not just in server quota but ongoing maintenance. For a while now we've not had enough people involved in running our infrastructure as we need to maintain the services we built over the years. We've been shouting it from the rooftops, but that doesn't seem to change anything, so all we can do at this point is aggressively sunset noncritical systems in order to hopefully have a small enough remainder that the people we do have can still keep it in good shape. Some of the systems we operate are tightly-coupled and taking them down would have massive ripple effects in other systems which would, counterintuitively, require more people to help untangle. The logstash service, on the other hand, is sufficiently decoupled from our more crucial systems that we can make a large dent in our upgrade and configuration management overhaul backlog by just turning it off. The workaround to which you allude is actually fairly straightforward. Someone can look at what we had as a proof of concept and build an equivalent system using newer and possibly more appropriate technologies. Watch the Gerrit events, fetch logs from swift for anything which gets reported, postprocess and index those, providing a query interface folks can use to find patterns. None of that requires privileged access to our systems; it's all built on public data. That "someone" needs to come from "somewhere" though. Upgrading the existing systems at this point is probably at least the same amount of work, given all the moving parts, the need to completely redo the current configuration management for it, the recent license strangeness with Elasticsearch, the fact that Logstash and Kibana are increasingly open-core fighting to keep useful features exclusively for their paying users... the whole stack needs to be reevaluated, and new components and architecture considered. -- Jeremy Stanley