Just following up since I got a few more minutes to poke at this
after discussing in IRC: I have confirmed the stats we have in
graphite seem to match what's recorded by logstash, and dug up
three example failure logs from today.




However, there's (thankfully) a consistent explanation. Take a look
at the timestamp gaps between the penultimate and ultimate lines of
each log... timeouts! So I agree the issue seems to be lack of
errexit in the npm-run builder. The old failures observed for
gate-horizon-npm-run-lint are probably similarly explained as
timeout issues we've just been lucky enough not to hit in the past
week or so. Unfortunately those failures fall just outside our
elasticsearch retention window so confirming that would be a very
time-intensive exercise at this point.
