[openstack-dev] [kolla] Heka v ELK stack logistics
inc007 at gmail.com
Wed Jan 13 15:07:08 UTC 2016
So 2 cents from me:
As sdake said, Heka seems to be replacement of rsyslog rather than
logstash. If it can replace both, even better, we end up with one less
dependency, and less dependencies are good. So I'd rather wait with
making this decision before Eric presents PoC he promised here .
rsyslog implementation is a bit hacky in our case (fake /dev/log), and
rsyslog was never meant to be dockerized service. Also we had problems
with oslo support of rsyslog, mostly because rsyslog is not very well
suited to handle multi-line logs, and python tracebacks are just that.
If Heka can replace these with something less hacky and more elegant,
I'd be +1 for that.
TLDR: Let's wait for Erics poc and make decision based on experiment
rather than ML discussion and number throwing.
On 13 January 2016 at 07:29, Steven Dake (stdake) <stdake at cisco.com> wrote:
> From: David Moreau Simard <dms at redhat.com>
> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
> <openstack-dev at lists.openstack.org>
> Date: Wednesday, January 13, 2016 at 5:55 AM
> To: "OpenStack Development Mailing List (not for usage questions)"
> <openstack-dev at lists.openstack.org>
> Subject: Re: [openstack-dev] [kolla] Heka v ELK stack logistics
> So is it decided that we want Heka instead of ELK in Kolla and that it is
> just a matter of time, then ?
> Before "deciding" a vote is necessary of the core reviewers. I just want to
> get the ball rolling because it seems like the core reviewers are leaning
> towards a Heka based solution and if we do use Heka, I'd like it done in
> Mitaka if at all possible so our operators don't have to learn a new
> diagnostics system.
> My beef against rsyslog is it has a bunch of gaps that we will never be able
> to fix. I don't recall the exact details, but inc0 or SamYaple could
> probably point them out in more detail. In my opinion this isn't about
> replacing logstash, this is about replacing rsyslog. This also isn't about
> performance or scalability – I think logstash can do the performance job
> needed for an OpenStack diagnostics system.
> It may be that Heka turns out to be a waste of time – the only way to know
> for certain is to see a proof of concept implementation – and if its better
> then logstash, merge it.
> I also have a beef against Java in general, because of the JVM fork between
> open source and Oracle. To me this makes dependencies that use Java less
> viable – even though I recognize Elasticsearch is implemented in Java.
> I recognize all your points about the well known mature nature of logstash.
> To me this is a huge advantage. Choosing dependencies wisely is one of the
> most important decisions made in software design and maturity goes a long
> way in the decision making process about dependency acceptance. I don't
> know much about Heka's maturity or even suitability for our problems – I am
> just basing this discussion on the direction I see happening on the mailing
> It would be nice if we could have both and select one or the other, but we
> are not going to be implementing in that manner as we need to choose one or
> the other.
> A POC that fixes all the gaps in rsyslog will be necessary to obtain my +2
> on the patch stream reviews.
> Clark Boyle put forward some very good points  which seem to have gone
> sadly mostly ignored.
> What are we trying to address by replacing ELK ? Performance ? Clark's
> numbers are far from being bad and ELK effectively scales in any direction
> you want.
> I'll put on my operator hat and would like to give my +1 to keep ELK instead
> of Heka.
> ELK at this point is all but a golden standard. People know it, people use
> it, people troubleshoot it. If something goes wrong, I can go on Google, on
> IRC or mailing lists and expect someone to be able to help.
> This is worth a lot to operators. OpenStack is already expensive enough,
> even if you don't take the vendor route.
> Python is slow but you don't see OpenStack being rewritten in Go (ok, Swift,
> you're an exception). Python just has that massive community of developers
> that OpenStack can tap into. This is worth a lot and in that respect, I am
> happy that OpenStack is in Python, even if it is slow.
> I'm not saying Heka is a bad decision or that it's an eccentric/exotic
> choice. But please let the decision be mindful of the people that will be
> deploying, configuring and supporting this. I don't believe a performance
> increase is worth it unless ELK was a real and painful bottleneck, which it
> is not.
> My 0.02$CAD (definitely not worth a lot right now)
> David Moreau Simard
> Senior Software Engineer | Openstack RDO
> dmsimard = [irc, github, twitter]
> On Jan 13, 2016 7:20 AM, "Steven Dake (stdake)" <stdake at cisco.com> wrote:
>> Hey folks,
>> I'd like to have a mailing list discussion about logistics of the ELKSTACK
>> solution that Alicja has sorted out vs the Heka implementation that Eric is
>> My take on that is Eric wants to replace rsyslog and logstash with Heka.
>> That seems fine, but I want to make certain this doesn't happen in a way
>> that leaves Kolla completely non-functional as we finish up Mitaka. Liberty
>> is the first version of Kolla people will deploy, and Mitaka is the first
>> version of Kolla people will upgrade to, so making sure that we don't
>> completely bust diagnostics (and I recognize diags as is are a little weak
>> is critical).
>> It sounds like from my reading of the previous thread on this topic,
>> unless there is some intractable problem, our goal is to use Heka to replace
>> resyslog and logstash. I'd ask inc0 (who did the rsyslog work) and Alicja
>> (who did the elkstack work) to understand that replacement often happens on
>> work that has already been done, and its not a "waste of time" so to speak
>> as an evolution of the system.
>> Here are the deadlines:
>> Let me help decode that for folks. March 4th is the final deadline to have
>> a completely working solution based upon Heka if its to enter Mitaka.
>> Unlike previous releases of Kolla, I want to hand off release management
>> of Kolla to the release management team, and to do that, we need to show a
>> track record of hitting our deadlines and not adding features past feature
>> freeze (the m3 milestone on March 4th). In the past releases of Kolla we as
>> a team were super loose on this requirement – going forward I prefer us
>> being super strict. Handing off to release management is a sign of maturity
>> and would have an overall positive impact, assuming we can get the software
>> written in time :)
>> I'd like a plan and commitment to either hit Mitaka 3, or the N cycle. It
>> must work well first on Ansible, and second on Mesos. If it doesn't work at
>> all on Mesos, I could live with that - I think the Mesos implementation
>> will really not be ready for prime time until the middle or completion of
>> the N cycle. We lead with Ansible, and I don't see that changing any time
>> soon – as a result, I want our Ansible deployment to be rock solid and
>> usable out of the gate. I don't expect to "Market" Mitaka Mesos (with the
>> OpenStack foundation's help) as "production ready" but rather as "tech
>> preview" and something for folks to evaluate.
>> I think a parallel development effort with the ELKSTACK that your working
>> on makes sense. In case the Heka development fails entirely, or misses
>> Mitaka 3, I don't want us left lacking a diagnostics solution for Mitaka.
>> Diagnostics is my priority #2 for Kolla (#1 is upgrades). Unfortunately
>> what this means is you may end up wasting your time doing development that
>> is replaced at the last minute in Mitaka 3, or later in the N cycle. This
>> is very common in software development (all the code I wrote for Magnum has
>> been sadly replaced). I know you can be a good team player here and take
>> one for the team so to speak, but I'm asking you if you would take offense
>> to this approach.
>> I'd like comments/questions/concerns on the above logistics approach
>> discussed, and a commitment from Eric as to when he thinks all the code
>> would land as one patch stream unit.
>> I'd also like to see the code come in as one super big patch stream (think
>> 30 patches in the stream) so the work can be evaluated and merged as one
>> unit. I could also live with 2-3 different patch streams with 10-15 patches
>> per stream, just so we can eval as a unit. This means lots of rebasing on
>> your part Eric ;-) It also means a commitment from the core reviewer team
>> to test and review this critical change. If there isn't a core reviewer on
>> board with this approach, please speak up now.
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev