[all][infra][qa] Retiring Logstash, Elasticsearch, subunit2sql, and Health

Slawek Kaplonski skaplons at redhat.com
Tue May 11 12:54:52 UTC 2021


Hi,

Dnia wtorek, 11 maja 2021 09:29:13 CEST Balazs Gibizer pisze:
> On Mon, May 10, 2021 at 10:34, Clark Boylan <cboylan at sapwetik.org>
> 
> wrote:
> > Hello everyone,
> 
> Hi,
> 
> > Xenial has recently reached the end of its life. Our
> > logstash+kibana+elasticsearch and subunit2sql+health data crunching
> > services all run on Xenial. Even without the distro platform EOL
> > concerns these services are growing old and haven't received the care
> > they need to keep running reliably.
> > 
> > Additionally these services represent a large portion of our resource
> > consumption:
> > 
> > * 6 x 16 vcpu + 60GB RAM + 1TB disk Elasticsearch servers
> > * 20 x 4 vcpu + 4GB RAM logstash-worker servers
> > * 1 x 2 vcpu + 2GB RAM logstash/kibana central server
> > * 2 x 8 vcpu + 8GB RAM subunit-worker servers
> > * 64GB RAM + 500GB disk subunit2sql trove db server
> > * 1 x 4 vcpu + 4GB RAM health server
> > 
> > To put things in perspective, they account for more than a quarter of
> > our control plane servers, occupying over a third of our block
> > storage and in excess of half the total memory footprint.
> > 
> > The OpenDev/OpenStack Infra team(s) don't seem to have the time
> > available currently to do the major lifting required to bring these
> > services up to date. I would like to propose that we simply turn them
> > off. All of these services operate off of public data that will not
> > be going away (specifically job log content). If others are
> > interested in taking this on they can hook into this data and run
> > their own processing pipelines.
> > 
> > I am sure not everyone will be happy with this proposal. I get it. I
> > came up with the idea for the elasticsearch job log processing way
> > back at the San Diego summit. I spent many many many hours since
> > working to get it up and running and to keep it running. But
> > pragmatism means that my efforts and the team's efforts are better
> > spent elsewhere.
> > 
> > I am happy to hear feedback on this. Thank you for your time.
> 
> Thank you and the whole infra team(s) for the effort to keeping the
> infrastructure alive. I'm an active user of the ELK stack in OpenStack.
> I use it to figure out if a particular gate failure I see is just a one
> time event or it is a real failure we need to fix. So I'm sad that this
> tooling will be shut down as I think I loose one of the tools that
> helped me keeping our Gate healthy. But I understood how busy is
> everybody these days. I'm not an infra person but if I can help somehow
> from Nova perspective then let me know. (E.g. I can review elastic
> recheck signatures if that helps)

I somehow missed that original email from Clark. But it's similar for Neutron 
team. I use logstash pretty often to check how ofter some issues happens in 
the CI.

> 
> Cheers,
> gibi
> 
> > Clark


-- 
Slawek Kaplonski
Principal Software Engineer
Red Hat
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210511/d67c2550/attachment.sig>


More information about the openstack-discuss mailing list