=?UTF-8?Q?Re:_[all][infra][qa]_Retiring_Logstash, _Elasticsearch, _subunit?= 2sql, and Health

Clark Boylan cboylan at sapwetik.org
Thu May 13 14:56:53 UTC 2021


On Thu, May 13, 2021, at 7:23 AM, Daniel Pawlik wrote:
> Hello Folks,
> 
> Thank you Jeremy and Clark for sharing the issue that you have. I 
> understand that the main issue is related to a lack of time. 
> ELK stack requires a lot of resources, but the values that you share 
> probably can be optimized. Is it possible to share 
> the architecture, how many servers are using which Elasticsearch server 
> role (master, data servers, etc.) ?

All of this information is public. We host high level docs [0] and you can always check the configuration management [1][2][3].

> 
> My team is managing RDO infra, which contains an ELK stack based on 
> Opendistro for Elasticsearch. 
> We have ansible playbooks to setup Elasticsearch base on Opendistro 
> just on one node. Almost all of ELK 
> stack services are located on one server that does not utilize a lot of 
> resources (the retention time is set to
> 10 days, 90GB of HDD is used, 2GB of RAM for Elasticsearch, 512MB for 
> Logstash). 
> Could you share, what is the retention time set currently in the 
> cluster that it requires 1 TB disk? Also other statistics like
>  how many queries are done in kibana and how much of HDD disk space is 
> used by the Openstack project and compare 
> it to other projects that are available in Opendev?

We currently have retention time set to 7 days. At peak we were indexing over a billion documents per day (this is after removing DEBUG logs too) and we run with a single replica. Cacti records [4] disk use by elasticsearch over time. Note that due to our use of a single replica we always want to have some free space to accommodate rebalancing if a cluster member is down.

We don't break this down as openstack vs not openstack at an elasticsearch level but typical numbers for Zuul test node CPU time shows us we are about 95% openstack and 5% not openstack.

I don't know what the total number of queries made against kibana is, but the bulk of querying is likely done by elastic-recheck which also has a public set of queries [5]. These are run multiple times an hour to keep dashboards up to date.

> 
> In the end, I would like to ask, if you can share what is the 
> Elasticsearch version currently running on your servers and if 
> you can share the -Xmx and -Xms parameters that are set in Logstash, 
> Elasticsearch and Kibana.

This info (at least for elasticsearch) is availabe in [1].

> 
> Thank you for your time and effort in keeping things running smoothly 
> for OpenDev.  We find the OpenDev ELK stack 
> valuable enough to the OpenDev community to take a much larger role in 
> keeping it running.   
> If you can think of any additional links or information that may be 
> helpful to us taking a larger role here, please do not 
> hesitate to share it.
> 
> Dan
> 

[0] https://docs.opendev.org/opendev/system-config/latest/logstash.html
[1] https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/manifests/elasticsearch_node.pp
[2] https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/manifests/logstash_worker.pp
[3] https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/manifests/logstash.pp
[4] http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=66519&rra_id=3&view_type=&graph_start=1618239228&graph_end=1620917628
[5] https://opendev.org/opendev/elastic-recheck/src/branch/master/queries



More information about the openstack-discuss mailing list