[openstack-dev] [kolla] Introduction of Heka in Kolla

Sam Yaple samuel at yaple.net
Mon Jan 11 16:04:56 UTC 2016


I like the idea of using Heka. You and I have discussed this on IRC before.
So my vote for this is +1. I can't think of any downside. I would like to
hear Alicja Kwasniewska's view on this as she has done the majority of work
with Logstash up until this point.

Sam Yaple

On Mon, Jan 11, 2016 at 3:16 PM, Eric LEMOINE <elemoine at mirantis.com> wrote:

> Hi
>
> As discussed on IRC the other day [1] we want to propose a distributed
> logs processing architecture based on Heka [2], built on Alicja
> Kwasniewska's ELK work with
> <https://review.openstack.org/#/c/252968/>.  Please take a look at the
> design document I've started working on [3].  The document is still
> work-in-progress, but the "Problem statement" and "Proposed change"
> sections should provide you with a good overview of the architecture
> we have in mind.
>
> In the proposed architecture each cluster node runs an instance of
> Heka for collecting and processing logs.  And instead of sending the
> processed logs to a centralized Logstash instance, logs are directly
> sent to Elasticsearch, which itself can be distributed across multiple
> nodes for high-availability and scaling.  The proposed architecture is
> based on Heka, and it doesn't use Logstash.
>
> That being said, it is important to note that the intent of this
> proposal is not strictly directed at replacing Logstash by Heka.  The
> intent is to propose a distributed architecture with Heka running on
> each cluster node rather than having Logstash run as a centralized
> logs processing component.  For such a distributed architecture we
> think that Heka is more appropriate, with a smaller memory footprint
> and better performances in general.  In addition, Heka is also more
> than a logs processing tool, as it's designed to process streams of
> any type of data, including events, logs and metrics.
>
> Some elements of comparison between Heka and Logstash:
>
> * Logstash was designed for logs processing.  Heka is a "unified data
> processing" software, designed to process streams of any type of data.
> So Heka is about running one service on each box instead of many.
> Using a single service for processing different types of data also
> makes it possible to do correlations, and derive metrics from logs and
> events.  See Rob Miller's presentation [4] for more details.
>
> * The virtual size of the Logstash Docker image is 447 MB, while the
> virtual size of an Heka image built from the same base image
> (debian:jessie) is 177 MB.  For comparison the virtual size of the
> Elasticsearch image is 345 MB.
>
> * Heka is written in Go and has no dependencies.  Go programs are
> compiled to native code.  This in contrast to Logstash which uses
> JRuby and as such requires running a Java Virtual Machine.  Besides
> this native versus interpreted code aspect, this also can raise the
> question of which JVM to use (Oracle, OpenJDK?) and which version
> (6,7,8?).
>
> * There are six types of Heka plugins: Inputs, Splitters, Decoders,
> Filters, Encoders, and Outputs.  Heka plugins are written in Go or
> Lua.  When written in Lua their executions are sandbox'ed, where
> misbehaving plugins may be shut down by Heka.  Lua plugins may also be
> dynamically added to Heka with no config changes or Heka restart. This
> is an important property on container environments such as Mesos,
> where workloads are changed dynamically.
>
> * To avoid losing logs under high load it is often recommend to use
> Logstash together with Redis [5].  Redis plays the role of a buffer,
> where logs are queued when Logstash or Elasticsearch cannot keep up
> with the load.  Heka, as a "unified data processing" software,
> includes its own resilient message queue, making it unnecessary to use
> an external queue (Redis for example).
>
> * Heka is faster than Logstash for processing logs, and its memory
> footprint is smaller.  I ran tests, where 3,400,000 log messages were
> read from 500 input files and then written to a single output file.
> Heka processed the 3,400,000 log messages in 12 seconds, consuming
> 500M of RAM.  Logstash processed the 3,400,000 log messages in 1mn
> 35s, consuming 1.1G of RAM.  Adding a grok filter to parse and
> structure logs, Logstash processed the 3,400,000 log messages in 2mn
> 15s, consuming 1.5G of RAM. Using an equivalent filtering plugin, Heka
> processed the 3,400,000 log messages in 27s, consuming 730M of RAM.
> See my GitHub repo [6] for more information about the test
> environment.
>
> Also, I want to say that our team has been using Heka in production
> for about a year, in clusters of up to 200 nodes.  Heka has proven to
> be very robust, efficient and flexible enough to address our logs
> processing and monitoring use-cases.  We've also acquired a solid
> experience with it.
>
> Any comments are welcome!
>
> Thanks.
>
>
> [1] <
> http://eavesdrop.openstack.org/meetings/kolla/2016/kolla.2016-01-06-16.32.html
> >
> [2] <http://hekad.readthedocs.org>
> [3] <
> https://docs.google.com/document/d/1RdckXedts4THPb6giAZvoy3ESiJ5GXau3PYIgbGR-fA/edit?usp=sharing
> >
> [4] <http://www.slideshare.net/devopsdays/heka-rob-miller>
> [5] <http://blog.sematext.com/2015/09/28/recipe-rsyslog-redis-logstash/>
> [6] <https://github.com/elemoine/heka-logstash-comparison>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160111/15951dc8/attachment.html>


More information about the OpenStack-dev mailing list