<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le mar. 11 mai 2021 à 09:35, Balazs Gibizer <balazs.gibizer@est.tech> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

<br>

On Mon, May 10, 2021 at 10:34, Clark Boylan <<a href="mailto:cboylan@sapwetik.org" target="_blank">cboylan@sapwetik.org</a>> <br>

wrote:<br>

> Hello everyone,<br>

<br>

Hi,<br>

<br>

> <br>

> Xenial has recently reached the end of its life. Our <br>

> logstash+kibana+elasticsearch and subunit2sql+health data crunching <br>

> services all run on Xenial. Even without the distro platform EOL <br>

> concerns these services are growing old and haven't received the care <br>

> they need to keep running reliably.<br>

> <br>

> Additionally these services represent a large portion of our resource <br>

> consumption:<br>

> <br>

> * 6 x 16 vcpu + 60GB RAM + 1TB disk Elasticsearch servers<br>

> * 20 x 4 vcpu + 4GB RAM logstash-worker servers<br>

> * 1 x 2 vcpu + 2GB RAM logstash/kibana central server<br>

> * 2 x 8 vcpu + 8GB RAM subunit-worker servers<br>

> * 64GB RAM + 500GB disk subunit2sql trove db server<br>

> * 1 x 4 vcpu + 4GB RAM health server<br>

> <br>

> To put things in perspective, they account for more than a quarter of <br>

> our control plane servers, occupying over a third of our block <br>

> storage and in excess of half the total memory footprint.<br>

> <br>

> The OpenDev/OpenStack Infra team(s) don't seem to have the time <br>

> available currently to do the major lifting required to bring these <br>

> services up to date. I would like to propose that we simply turn them <br>

> off. All of these services operate off of public data that will not <br>

> be going away (specifically job log content). If others are <br>

> interested in taking this on they can hook into this data and run <br>

> their own processing pipelines.<br>

> <br>

> I am sure not everyone will be happy with this proposal. I get it. I <br>

> came up with the idea for the elasticsearch job log processing way <br>

> back at the San Diego summit. I spent many many many hours since <br>

> working to get it up and running and to keep it running. But <br>

> pragmatism means that my efforts and the team's efforts are better <br>

> spent elsewhere.<br>

> <br>

> I am happy to hear feedback on this. Thank you for your time.<br>

<br>

Thank you and the whole infra team(s) for the effort to keeping the <br>

infrastructure alive. I'm an active user of the ELK stack in OpenStack. <br>

I use it to figure out if a particular gate failure I see is just a one <br>

time event or it is a real failure we need to fix. So I'm sad that this <br>

tooling will be shut down as I think I loose one of the tools that <br>

helped me keeping our Gate healthy. But I understood how busy is <br>

everybody these days. I'm not an infra person but if I can help somehow <br>

from Nova perspective then let me know. (E.g. I can review elastic <br>

recheck signatures if that helps)<br>

<br></blockquote><div><br></div><div>Worth said, gibi.</div><div>I understand the reasoning behind the ELK sunset but I'm a bit afraid of not having a way to know the number of changes that were failing with the same exception than one I saw.</div><div><br></div><div>Could we be discussing how we could try to find a workaround for this ? Maybe no longer using ELK, but at least still continuing to have the logs for, say, 2 weeks ?</div><div><br></div><div>-Sylvain</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Cheers,<br>

gibi<br>

<br>

> <br>

> Clark<br>

> <br>

> <br>

<br>

<br>

<br>

</blockquote></div></div>