[openstack-dev] [kolla] Introduction of Heka in Kolla

Kwasniewska, Alicja alicja.kwasniewska at intel.com
Wed Jan 13 11:11:36 UTC 2016

Eric, Patrick, Simon, Clark thanks for your comments.

I don't know Heka, so that's why I ask a lot of questions. I hope you are fine with that:) I am not against Heka, I was just curious how reliable it is  and how much experience you have with setting it up in Docker environment in order to know both advantages and disadvantages of this solution. 

@Eric, great that you are going to create POC, it will explain a lot and it will show us possible problems. 

Kind regards,
Alicja Kwaśniewska

-----Original Message-----
From: Eric LEMOINE [mailto:elemoine at mirantis.com] 
Sent: Wednesday, January 13, 2016 10:55 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

Hi Alicja

Thank you for your comments.  Answers and comments below.

On Tue, Jan 12, 2016 at 1:19 PM, Kwasniewska, Alicja <alicja.kwasniewska at intel.com> wrote:
> Unfortunately I do not have any experience in working or testing Heka, 
> so it’s hard for me to compare its performance vs Logstash 
> performance. However I’ve read that Heka possess a lot advantages over Logstash in this scope.
> But which version of Logstash did you test? One guy from the Logstash 
> community said that: “The next release of logstash (1.2.0 is in beta) 
> has a 3.5x improvement in event throughput. For numbers: on my 
> workstation at home
> (6 vcpu on virtualbox, host OS windows, 8 GB ram, host cpu is FX-8150) 
> - with logstash 1.1.13, I can process roughly 31,000 events/sec 
> parsing apache logs. With logstash 1.2.0.beta1, I can process 102,000 events/sec.”

I've used the latest Docker image:
<https://hub.docker.com/r/library/logstash/>.  It uses Logstash 2.1.1, which is the most recent stable version.

> You also said that Heka is a unified data processing, but do we need this?

Heka, as a unified data processing, enables to derive metrics from logs, HTTP response times for example.  Alerts can also be triggered on specific log patterns.

> Heka seems to address stream processing needs, while Logstash focuses 
> mainly on processing logs. We want to create a central logging 
> service, and Logstash was created especially for it and seems to work 
> well for this application.
> One thing that is obvious is the fact that the Logstash is better 
> known, more popular and tested. Maybe it has some performance 
> disadvantages, but at least we know what we can expect from it. Also, 
> it has more pre-built plugins and has a lot examples of usage, while 
> Heka doesn’t have many of them yet and is nowhere near the range of 
> plugins and integrations provided by Logstash.

As Simon said Heka already includes quite a lot of plugins.  See the Heka documentation [*] for an exhaustive list.  It may indeed be the case that Logstash includes even more plugins, but Heka has taken us pretty far already.

> In the case of adding plugins, I’ve read that in order to add Go 
> plugins, the binary has to be recompiled, what is a little bit 
> frustrating (static linking - to wire in new plugins, have to 
> recompile). On the other hand, the Lua plugins do not require it, but 
> the question is whether Lua plugins are sufficient? Or maybe adding Go plugins is not so bad?

See Simon's answer.

> You also said that you didn’t test the Heka with Docker, right?

I did test Heka with Docker.  In my performance tests both Heka and Logstash ran in Docker containers.  What I haven't tested yet is the Docker Log Input plugin.  We'll do more tests as part of the work on specifications.

> But do you
> have any experience in setting up Heka in Docker container? I saw that 
> with Heka 0.8.0 new Docker features were implemented (included 
> Dockerfiles to generate Heka Docker containers for both development 
> and deployment), but did you test it? If you didn’t, we could not be 
> sure whether there are any issues with it.
> Moreover you will have to write your own Dockerfile for Heka that 
> inherits from Kolla base image (as we discussed during last meeting, 
> we would like to have our own images), you won’t be able to inherit 
> from ianneub/heka:0.10 as specified in the link that you sent 
> http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/.

As I said in my first email Heka has no dependencies, so creating a Dockerfile for Heka is quite easy.  See <https://github.com/elemoine/heka-logstash-comparison/blob/master/Dockerfile>
for the super-simple Dockerfile I've used so far.

> There are also some issues with DockerInput Module which you want to use.
> For example splitters are not available in DockerInput 
> (https://github.com/mozilla-services/heka/issues/1643). I can’t say 
> that it will affect us, but we also don’t know which new issues may 
> arise during first tests, as any of us has ever tried Heka in and with Dockers.

Yes, we're aware of that limitation.  But, we're not sure this is a problem, as the decoder can be the component coalescing log lines.  We already have a Lua decoder that does that, accumulating lines of Python Tracebacks.  I am going to look at this in more detail when working on the blueprint.

> I am not stick to any specific solution, however just not sure whether 
> Heka won’t surprise us with something hard to solve, configure, etc.

We chose Heka because it's lightweight and fast, while providing us with the flexibility we need for processing different types of data streams.  The distributed architecture we think is necessary for large environments requires running the logs processing component on each cluster node, and we did not want to run a JVM on each node, especially on compute nodes where user VMs run.


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe

More information about the OpenStack-dev mailing list