On Tue, May 11, 2021 at 5:29 PM Clark Boylan <cboylan@sapwetik.org> wrote:
On Tue, May 11, 2021, at 6:56 AM, Jeremy Stanley wrote:
> On 2021-05-11 09:47:45 +0200 (+0200), Sylvain Bauza wrote:
> [...]
> > Could we be discussing how we could try to find a workaround for
> > this?
> [...]

snip. What Fungi said is great. I just wanted to add a bit of detail below.

> Upgrading the existing systems at this point is probably at least
> the same amount of work, given all the moving parts, the need to
> completely redo the current configuration management for it, the
> recent license strangeness with Elasticsearch, the fact that
> Logstash and Kibana are increasingly open-core fighting to keep
> useful features exclusively for their paying users... the whole
> stack needs to be reevaluated, and new components and architecture
> considered.

To add a bit more concrete info to this the current config management for all of this is Puppet. We no longer have the ability to run Puppet in our infrastructure on systems beyond Ubuntu Xenial. What we have been doing for newer systems is using Ansible (often coupled with docker + docker-compose) to deploy services. This means that all of the config management needs to be redone.

The next problem you'll face is that Elasticsearch itself needs to be upgraded. Historically when we have done this, it has required also upgrading Kibana and Logstash due to compatibility problems. When you upgrade Kibana you have to sort out all of the data access and authorizations problems that Elasticsearch presents because it doesn't provide authentication and authorization (we cannot allow arbitrary writes into the ES cluster, Kibana assumes it can do this). With Logstash you end up rewriting all of your rules.

Finally, I don't think we have enough room to do rolling replacements of Elasticsearch cluster members as they are so large. We have to delete servers to add servers. Typically we would add server, rotate in, then delete the old one. In this case the idea is probably to spin up an entirely new cluster along side the old one, check that it is functional, then shift the data streaming over to point at it. Unfortunately, that won't be possible.

> --
> Jeremy Stanley

First, thanks both Jeremy and fungi for explaining why we need to stop to provide a ELK environment for our logs. I now understand it better and honestly I can't really find a way to fix it just by me.
I'm just sad we can't for the moment find a way to have a way to continue looking at this unless finding "someone" who would help us :-)

Just a note, I then also guess that http://status.openstack.org/elastic-recheck/ will stop to work as well, right?

Operators, if you read me and want to make sure that our upstream CI continues to work as we could see gate issues, please help us ! :-)