[openstack-dev] [kolla] Introduction of Heka in Kolla

Patrick Petit ppetit at mirantis.com
Wed Jan 13 08:48:40 UTC 2016


On 12 Jan 2016 at 13:24:26, Kwasniewska, Alicja (alicja.kwasniewska at intel.com) wrote:

Unfortunately I do not have any experience in working or testing Heka, so it’s hard for me to compare its performance vs Logstash performance. However I’ve read that Heka possess a lot advantages over Logstash in this scope.



But which version of Logstash did you test? One guy from the Logstash community said that: “The next release of logstash (1.2.0 is in beta) has a 3.5x improvement in event throughput. For numbers: on my workstation at home (6 vcpu on virtualbox, host OS windows, 8 GB ram, host cpu is FX-8150) - with logstash 1.1.13, I can process roughly 31,000 events/sec parsing apache logs. With logstash 1.2.0.beta1, I can process 102,000 events/sec.”



You also said that Heka is a unified data processing, but do we need this? Heka seems to address stream processing needs, while Logstash focuses mainly on processing logs. We want to create a central logging service, and Logstash was created especially for it and seems to work well for this application.


I think you are touching a key point here. Our thinking is that Heka is doing at least as well as Logstach to collecting and parsing logs with lesser footprint and higher performance but it can do more as you noticed. This is exactly why we came to using that tool in a first place and like it hence the motivation to proposing it here. It’s not a handicap but an asset because you can choose to do more if you want to and so avoid the sprawl of tools to do different things. Consider the prospect of transforming logs matching a particular pattern into metric messages (e.x. average http response time, http 5xx errors count, errors rate, ...) that you could send to a time-series like InfluxDB… Wouldn't that be cool? I am not saying that you couldn't do it with Logstach but doing it with Heka could be distributed on the hosts and is much easier to implement because of the streams processing design. That’s a big plus.

One thing that is obvious is the fact that the Logstash is better known, more popular and tested. Maybe it has some performance disadvantages, but at least we know what we can expect from it. Also, it has more pre-built plugins and has a lot examples of usage, while Heka doesn’t have many of them yet and is nowhere near the range of plugins and integrations provided by Logstash.

I tend to disagree with that. You may think that Heka has less plugins out-of-the-box but in practice it has all the plugins needed to cover a variety of use cases I would say even beyond Lofstach thanks to Heka’s approach to decoupling protocol (input and output) plugins from deserialisation/serialisation (decoder/encoder) plugins. You can slice and dice combinations of those plugins and if you need to support a new message format it suffices to implement a decoder or an encoder in Lua using any combination of protocols including http, tcp, udp, amqp, kafka, statsd, … What more would you need?





In the case of adding plugins, I’ve read that in order to add Go plugins, the binary has to be recompiled, what is a little bit frustrating (static linking - to wire in new plugins, have to recompile). On the other hand, the Lua plugins do not require it, but the question is whether Lua plugins are sufficient? Or maybe adding Go plugins is not so bad?


We are using Heka to address a much broader spectrum of use cases and functionalities (some being very sophisticated) but as it is not the subject of the conversation I will not expand on this but we never found the need to write a plugin in Go. Lua and associated libraries have always been sufficient to address our needs.  

You also said that you didn’t test the Heka with Docker, right? But do you have any experience in setting up Heka in Docker container? I saw that with Heka 0.8.0 new Docker features were implemented (included Dockerfiles to generate Heka Docker containers for both development and deployment), but did you test it? If you didn’t, we could not be sure whether there are any issues with it.



Moreover you will have to write your own Dockerfile for Heka that inherits from Kolla base image (as we discussed during last meeting, we would like to have our own images), you won’t be able to inherit from ianneub/heka:0.10 as specified in the link that you sent http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/.



There are also some issues with DockerInput Module which you want to use. For example splitters are not available in DockerInput (https://github.com/mozilla-services/heka/issues/1643). I can’t say that it will affect us, but we also don’t know which new issues may arise during first tests, as any of us has ever tried Heka in and with Dockers.



I am not stick to any specific solution, however just not sure whether Heka won’t surprise us with something hard to solve, configure, etc.

Well I guess that’s a fact of life we (especially in IT industry) have to live with no matter what.


 


Alicja Kwaśniewska

 

From: Sam Yaple [mailto:samuel at yaple.net]
Sent: Monday, January 11, 2016 11:37 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [kolla] Introduction of Heka in Kolla

 

Here is why I am on board with this. As we have discovered, the logging with the syslog plugin leaves alot to be desired. It (to my understanding) still can't save tracebacks/stacktraces to the log files for whatever reason. stdout/stderr however works perfectly fine. That said the Docker log stuff has been a source of pain in the past, but it has gotten better. It does have the limitation of being only able to log one output at a time. This means, as an example, the neutron-dhcp-agent could send its logs to stdout/err but the dnsmasq process that it launch (that also has logs) would have to mix its logs in with the neutron logs in stdout/err. Can Heka handle this and separate them efficiently? Otherwise I see no choice but to stick with something that can handle multiple logs from a single container.



Sam Yaple

 

On Mon, Jan 11, 2016 at 10:16 PM, Eric LEMOINE <elemoine at mirantis.com> wrote:


Le 11 janv. 2016 18:45, "Michał Jastrzębski" <inc007 at gmail.com> a écrit :
>
> On 11 January 2016 at 10:55, Eric LEMOINE <elemoine at mirantis.com> wrote:
> > Currently the services running in containers send their logs to
> > rsyslog. And rsyslog stores the logs in local files, located in the
> > host's /var/log directory.
>
> Yeah, however plan was to teach rsyslog to forward logs to central
> logging stack once this thing is implemented.

Yes. With the current ELK Change Request, Rsyslog sends logs to the central Logstash instance. If you read my design doc you'll understand that it's precisely what we're proposing changing.

> > I know. Our plan is to rely on Docker. Basically: containers write
> > their logs to stdout. The logs are collected by Docker Engine, which
> > makes them available through the unix:///var/run/docker.sock socket.
> > The socket is mounted into the Heka container, which uses the Docker
> > Log Input plugin [*] to reads the logs from that that socket.
> >
> > [*] <http://hekad.readthedocs.org/en/latest/config/inputs/docker_log.html>
>
> So docker logs isn't best thing there is, however I'd suspect that's
> mostly console output fault. If you can tap into stdout efficiently,
> I'd say that's pretty good option.

I'm not following you. Could you please be more specific?

> >> Seems to me we need additional comparason of heka vs rsyslog;) Also
> >> this would have to be hands down better because rsyslog is already
> >> implemented, working and most of operators knows how to use it.
> >
> >
> > We don't need to remove Rsyslog. Services running in containers can
> > write their logs to both Rsyslog and stdout, which even is what they
> > do today (at least for the OpenStack services).
> >
>
> There is no point for that imho. I don't want to have several systems
> doing the same thing. Let's make decision of one, but optimal toolset.
> Could you please describe bottoms up what would your logging stack
> look like? Heka listening on stdout, transfering stuff to
> elasticsearch and kibana on top of it?

My plan is to provide details in the blueprint document, that I'll continue working on if the core developers agree with the principles of the proposed architecture and change.

But here's our plan—as already described in my previous email: the Kolla services, which run in containers, write their logs to stdout. Logs are collected by the Docker engine. Heka's Docker Log Input plugin is used to read the container logs from the Docker endpoint (Unix socket). Since Heka will run in a container a volume is necessary for accessing the Docker endpoint. The Docker Log Input plugin inserts the logs into the Heka pipeline, at the end of which an Elasticsearch Output plugin will send the log messages to Elasticsearch. Here's a blog post reporting on that approach: <http://www.ianneubert.com/wp/2015/03/03/how-to-use-heka-docker-and-tutum/>. We haven't tested that approach yet, but we plan to experiment with it as we work on the specs.


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 

__________________________________________________________________________ 
OpenStack Development Mailing List (not for usage questions) 
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe 
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160113/07b4cb95/attachment.html>


More information about the OpenStack-dev mailing list