[openstack-dev] [kolla] Introduction of Heka in Kolla
elemoine at mirantis.com
Mon Jan 11 15:16:00 UTC 2016
As discussed on IRC the other day  we want to propose a distributed
logs processing architecture based on Heka , built on Alicja
Kwasniewska's ELK work with
<https://review.openstack.org/#/c/252968/>. Please take a look at the
design document I've started working on . The document is still
work-in-progress, but the "Problem statement" and "Proposed change"
sections should provide you with a good overview of the architecture
we have in mind.
In the proposed architecture each cluster node runs an instance of
Heka for collecting and processing logs. And instead of sending the
processed logs to a centralized Logstash instance, logs are directly
sent to Elasticsearch, which itself can be distributed across multiple
nodes for high-availability and scaling. The proposed architecture is
based on Heka, and it doesn't use Logstash.
That being said, it is important to note that the intent of this
proposal is not strictly directed at replacing Logstash by Heka. The
intent is to propose a distributed architecture with Heka running on
each cluster node rather than having Logstash run as a centralized
logs processing component. For such a distributed architecture we
think that Heka is more appropriate, with a smaller memory footprint
and better performances in general. In addition, Heka is also more
than a logs processing tool, as it's designed to process streams of
any type of data, including events, logs and metrics.
Some elements of comparison between Heka and Logstash:
* Logstash was designed for logs processing. Heka is a "unified data
processing" software, designed to process streams of any type of data.
So Heka is about running one service on each box instead of many.
Using a single service for processing different types of data also
makes it possible to do correlations, and derive metrics from logs and
events. See Rob Miller's presentation  for more details.
* The virtual size of the Logstash Docker image is 447 MB, while the
virtual size of an Heka image built from the same base image
(debian:jessie) is 177 MB. For comparison the virtual size of the
Elasticsearch image is 345 MB.
* Heka is written in Go and has no dependencies. Go programs are
compiled to native code. This in contrast to Logstash which uses
JRuby and as such requires running a Java Virtual Machine. Besides
this native versus interpreted code aspect, this also can raise the
question of which JVM to use (Oracle, OpenJDK?) and which version
* There are six types of Heka plugins: Inputs, Splitters, Decoders,
Filters, Encoders, and Outputs. Heka plugins are written in Go or
Lua. When written in Lua their executions are sandbox'ed, where
misbehaving plugins may be shut down by Heka. Lua plugins may also be
dynamically added to Heka with no config changes or Heka restart. This
is an important property on container environments such as Mesos,
where workloads are changed dynamically.
* To avoid losing logs under high load it is often recommend to use
Logstash together with Redis . Redis plays the role of a buffer,
where logs are queued when Logstash or Elasticsearch cannot keep up
with the load. Heka, as a "unified data processing" software,
includes its own resilient message queue, making it unnecessary to use
an external queue (Redis for example).
* Heka is faster than Logstash for processing logs, and its memory
footprint is smaller. I ran tests, where 3,400,000 log messages were
read from 500 input files and then written to a single output file.
Heka processed the 3,400,000 log messages in 12 seconds, consuming
500M of RAM. Logstash processed the 3,400,000 log messages in 1mn
35s, consuming 1.1G of RAM. Adding a grok filter to parse and
structure logs, Logstash processed the 3,400,000 log messages in 2mn
15s, consuming 1.5G of RAM. Using an equivalent filtering plugin, Heka
processed the 3,400,000 log messages in 27s, consuming 730M of RAM.
See my GitHub repo  for more information about the test
Also, I want to say that our team has been using Heka in production
for about a year, in clusters of up to 200 nodes. Heka has proven to
be very robust, efficient and flexible enough to address our logs
processing and monitoring use-cases. We've also acquired a solid
experience with it.
Any comments are welcome!
More information about the OpenStack-dev