[openstack-dev] [kolla] Introduction of Heka in Kolla
samuel at yaple.net
Mon Jan 11 16:04:56 UTC 2016
I like the idea of using Heka. You and I have discussed this on IRC before.
So my vote for this is +1. I can't think of any downside. I would like to
hear Alicja Kwasniewska's view on this as she has done the majority of work
with Logstash up until this point.
On Mon, Jan 11, 2016 at 3:16 PM, Eric LEMOINE <elemoine at mirantis.com> wrote:
> As discussed on IRC the other day  we want to propose a distributed
> logs processing architecture based on Heka , built on Alicja
> Kwasniewska's ELK work with
> <https://review.openstack.org/#/c/252968/>. Please take a look at the
> design document I've started working on . The document is still
> work-in-progress, but the "Problem statement" and "Proposed change"
> sections should provide you with a good overview of the architecture
> we have in mind.
> In the proposed architecture each cluster node runs an instance of
> Heka for collecting and processing logs. And instead of sending the
> processed logs to a centralized Logstash instance, logs are directly
> sent to Elasticsearch, which itself can be distributed across multiple
> nodes for high-availability and scaling. The proposed architecture is
> based on Heka, and it doesn't use Logstash.
> That being said, it is important to note that the intent of this
> proposal is not strictly directed at replacing Logstash by Heka. The
> intent is to propose a distributed architecture with Heka running on
> each cluster node rather than having Logstash run as a centralized
> logs processing component. For such a distributed architecture we
> think that Heka is more appropriate, with a smaller memory footprint
> and better performances in general. In addition, Heka is also more
> than a logs processing tool, as it's designed to process streams of
> any type of data, including events, logs and metrics.
> Some elements of comparison between Heka and Logstash:
> * Logstash was designed for logs processing. Heka is a "unified data
> processing" software, designed to process streams of any type of data.
> So Heka is about running one service on each box instead of many.
> Using a single service for processing different types of data also
> makes it possible to do correlations, and derive metrics from logs and
> events. See Rob Miller's presentation  for more details.
> * The virtual size of the Logstash Docker image is 447 MB, while the
> virtual size of an Heka image built from the same base image
> (debian:jessie) is 177 MB. For comparison the virtual size of the
> Elasticsearch image is 345 MB.
> * Heka is written in Go and has no dependencies. Go programs are
> compiled to native code. This in contrast to Logstash which uses
> JRuby and as such requires running a Java Virtual Machine. Besides
> this native versus interpreted code aspect, this also can raise the
> question of which JVM to use (Oracle, OpenJDK?) and which version
> * There are six types of Heka plugins: Inputs, Splitters, Decoders,
> Filters, Encoders, and Outputs. Heka plugins are written in Go or
> Lua. When written in Lua their executions are sandbox'ed, where
> misbehaving plugins may be shut down by Heka. Lua plugins may also be
> dynamically added to Heka with no config changes or Heka restart. This
> is an important property on container environments such as Mesos,
> where workloads are changed dynamically.
> * To avoid losing logs under high load it is often recommend to use
> Logstash together with Redis . Redis plays the role of a buffer,
> where logs are queued when Logstash or Elasticsearch cannot keep up
> with the load. Heka, as a "unified data processing" software,
> includes its own resilient message queue, making it unnecessary to use
> an external queue (Redis for example).
> * Heka is faster than Logstash for processing logs, and its memory
> footprint is smaller. I ran tests, where 3,400,000 log messages were
> read from 500 input files and then written to a single output file.
> Heka processed the 3,400,000 log messages in 12 seconds, consuming
> 500M of RAM. Logstash processed the 3,400,000 log messages in 1mn
> 35s, consuming 1.1G of RAM. Adding a grok filter to parse and
> structure logs, Logstash processed the 3,400,000 log messages in 2mn
> 15s, consuming 1.5G of RAM. Using an equivalent filtering plugin, Heka
> processed the 3,400,000 log messages in 27s, consuming 730M of RAM.
> See my GitHub repo  for more information about the test
> Also, I want to say that our team has been using Heka in production
> for about a year, in clusters of up to 200 nodes. Heka has proven to
> be very robust, efficient and flexible enough to address our logs
> processing and monitoring use-cases. We've also acquired a solid
> experience with it.
> Any comments are welcome!
>  <
>  <http://hekad.readthedocs.org>
>  <
>  <http://www.slideshare.net/devopsdays/heka-rob-miller>
>  <http://blog.sematext.com/2015/09/28/recipe-rsyslog-redis-logstash/>
>  <https://github.com/elemoine/heka-logstash-comparison>
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev