[openstack-dev] [kolla] Introduction of Heka in Kolla

Michał Jastrzębski inc007 at gmail.com
Mon Jan 11 16:01:09 UTC 2016


On 11 January 2016 at 09:16, Eric LEMOINE <elemoine at mirantis.com> wrote:
> Hi
>
> As discussed on IRC the other day [1] we want to propose a distributed
> logs processing architecture based on Heka [2], built on Alicja
> Kwasniewska's ELK work with
> <https://review.openstack.org/#/c/252968/>.  Please take a look at the
> design document I've started working on [3].  The document is still
> work-in-progress, but the "Problem statement" and "Proposed change"
> sections should provide you with a good overview of the architecture
> we have in mind.
>
> In the proposed architecture each cluster node runs an instance of
> Heka for collecting and processing logs.  And instead of sending the
> processed logs to a centralized Logstash instance, logs are directly
> sent to Elasticsearch, which itself can be distributed across multiple
> nodes for high-availability and scaling.  The proposed architecture is
> based on Heka, and it doesn't use Logstash.
>
> That being said, it is important to note that the intent of this
> proposal is not strictly directed at replacing Logstash by Heka.  The
> intent is to propose a distributed architecture with Heka running on
> each cluster node rather than having Logstash run as a centralized
> logs processing component.  For such a distributed architecture we
> think that Heka is more appropriate, with a smaller memory footprint
> and better performances in general.  In addition, Heka is also more
> than a logs processing tool, as it's designed to process streams of
> any type of data, including events, logs and metrics.
>
> Some elements of comparison between Heka and Logstash:
>
> * Logstash was designed for logs processing.  Heka is a "unified data
> processing" software, designed to process streams of any type of data.
> So Heka is about running one service on each box instead of many.
> Using a single service for processing different types of data also
> makes it possible to do correlations, and derive metrics from logs and
> events.  See Rob Miller's presentation [4] for more details.

Right now we use rsyslog for that. As I understand Heka right now
would be actually alternative to rsyslog, and that is already
implemented. Also with Heka case, we might run into same problem we've
seen with rsyslog - transporting logs from service to heka. Keep in
mind we're in dockers and heka will be in different container than
service it's supposed to listen to. We do that by sharing faked
/dev/log across containers and rsyslog can handle that.
Also Heka seems to be just processing mechanism while logstash is
well...stash, so it saves and persists logs, so it seems to me they're
different layers of log processing.

Seems to me we need additional comparason of heka vs rsyslog;) Also
this would have to be hands down better because rsyslog is already
implemented, working and most of operators knows how to use it.

> * The virtual size of the Logstash Docker image is 447 MB, while the
> virtual size of an Heka image built from the same base image
> (debian:jessie) is 177 MB.  For comparison the virtual size of the
> Elasticsearch image is 345 MB.
>
> * Heka is written in Go and has no dependencies.  Go programs are
> compiled to native code.  This in contrast to Logstash which uses
> JRuby and as such requires running a Java Virtual Machine.  Besides
> this native versus interpreted code aspect, this also can raise the
> question of which JVM to use (Oracle, OpenJDK?) and which version
> (6,7,8?).
>
> * There are six types of Heka plugins: Inputs, Splitters, Decoders,
> Filters, Encoders, and Outputs.  Heka plugins are written in Go or
> Lua.  When written in Lua their executions are sandbox'ed, where
> misbehaving plugins may be shut down by Heka.  Lua plugins may also be
> dynamically added to Heka with no config changes or Heka restart. This
> is an important property on container environments such as Mesos,
> where workloads are changed dynamically.
>
> * To avoid losing logs under high load it is often recommend to use
> Logstash together with Redis [5].  Redis plays the role of a buffer,
> where logs are queued when Logstash or Elasticsearch cannot keep up
> with the load.  Heka, as a "unified data processing" software,
> includes its own resilient message queue, making it unnecessary to use
> an external queue (Redis for example).
>
> * Heka is faster than Logstash for processing logs, and its memory
> footprint is smaller.  I ran tests, where 3,400,000 log messages were
> read from 500 input files and then written to a single output file.
> Heka processed the 3,400,000 log messages in 12 seconds, consuming
> 500M of RAM.  Logstash processed the 3,400,000 log messages in 1mn
> 35s, consuming 1.1G of RAM.  Adding a grok filter to parse and
> structure logs, Logstash processed the 3,400,000 log messages in 2mn
> 15s, consuming 1.5G of RAM. Using an equivalent filtering plugin, Heka
> processed the 3,400,000 log messages in 27s, consuming 730M of RAM.
> See my GitHub repo [6] for more information about the test
> environment.
>
> Also, I want to say that our team has been using Heka in production
> for about a year, in clusters of up to 200 nodes.  Heka has proven to
> be very robust, efficient and flexible enough to address our logs
> processing and monitoring use-cases.  We've also acquired a solid
> experience with it.
>
> Any comments are welcome!
>
> Thanks.
>
>
> [1] <http://eavesdrop.openstack.org/meetings/kolla/2016/kolla.2016-01-06-16.32.html>
> [2] <http://hekad.readthedocs.org>
> [3] <https://docs.google.com/document/d/1RdckXedts4THPb6giAZvoy3ESiJ5GXau3PYIgbGR-fA/edit?usp=sharing>
> [4] <http://www.slideshare.net/devopsdays/heka-rob-miller>
> [5] <http://blog.sematext.com/2015/09/28/recipe-rsyslog-redis-logstash/>
> [6] <https://github.com/elemoine/heka-logstash-comparison>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list