<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le mar. 11 mai 2021 à 09:35, Balazs Gibizer <balazs.gibizer@est.tech> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
<br>
On Mon, May 10, 2021 at 10:34, Clark Boylan <<a href="mailto:cboylan@sapwetik.org" target="_blank">cboylan@sapwetik.org</a>> <br>
wrote:<br>
> Hello everyone,<br>
<br>
Hi,<br>
<br>
> <br>
> Xenial has recently reached the end of its life. Our <br>
> logstash+kibana+elasticsearch and subunit2sql+health data crunching <br>
> services all run on Xenial. Even without the distro platform EOL <br>
> concerns these services are growing old and haven't received the care <br>
> they need to keep running reliably.<br>
> <br>
> Additionally these services represent a large portion of our resource <br>
> consumption:<br>
> <br>
> * 6 x 16 vcpu + 60GB RAM + 1TB disk Elasticsearch servers<br>
> * 20 x 4 vcpu + 4GB RAM logstash-worker servers<br>
> * 1 x 2 vcpu + 2GB RAM logstash/kibana central server<br>
> * 2 x 8 vcpu + 8GB RAM subunit-worker servers<br>
> * 64GB RAM + 500GB disk subunit2sql trove db server<br>
> * 1 x 4 vcpu + 4GB RAM health server<br>
> <br>
> To put things in perspective, they account for more than a quarter of <br>
> our control plane servers, occupying over a third of our block <br>
> storage and in excess of half the total memory footprint.<br>
> <br>
> The OpenDev/OpenStack Infra team(s) don't seem to have the time <br>
> available currently to do the major lifting required to bring these <br>
> services up to date. I would like to propose that we simply turn them <br>
> off. All of these services operate off of public data that will not <br>
> be going away (specifically job log content). If others are <br>
> interested in taking this on they can hook into this data and run <br>
> their own processing pipelines.<br>
> <br>
> I am sure not everyone will be happy with this proposal. I get it. I <br>
> came up with the idea for the elasticsearch job log processing way <br>
> back at the San Diego summit. I spent many many many hours since <br>
> working to get it up and running and to keep it running. But <br>
> pragmatism means that my efforts and the team's efforts are better <br>
> spent elsewhere.<br>
> <br>
> I am happy to hear feedback on this. Thank you for your time.<br>
<br>
Thank you and the whole infra team(s) for the effort to keeping the <br>
infrastructure alive. I'm an active user of the ELK stack in OpenStack. <br>
I use it to figure out if a particular gate failure I see is just a one <br>
time event or it is a real failure we need to fix. So I'm sad that this <br>
tooling will be shut down as I think I loose one of the tools that <br>
helped me keeping our Gate healthy. But I understood how busy is <br>
everybody these days. I'm not an infra person but if I can help somehow <br>
from Nova perspective then let me know. (E.g. I can review elastic <br>
recheck signatures if that helps)<br>
<br></blockquote><div><br></div><div>Worth said, gibi.</div><div>I understand the reasoning behind the ELK sunset but I'm a bit afraid of not having a way to know the number of changes that were failing with the same exception than one I saw.</div><div><br></div><div>Could we be discussing how we could try to find a workaround for this ? Maybe no longer using ELK, but at least still continuing to have the logs for, say, 2 weeks ?</div><div><br></div><div>-Sylvain</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Cheers,<br>
gibi<br>
<br>
> <br>
> Clark<br>
> <br>
> <br>
<br>
<br>
<br>
</blockquote></div></div>