[OpenStack-Infra] Adding index and views/dashboards for Kata to ELK stack

Clark Boylan cboylan at sapwetik.org
Tue Nov 27 23:15:34 UTC 2018


On Tue, Nov 27, 2018, at 10:53 AM, Whaley, Graham wrote:
> (back to an old thread... this has rippled near the top of my pile again)
> 
> > -----Original Message-----
> > From: Clark Boylan [mailto:cboylan at sapwetik.org]
> > Sent: Tuesday, October 23, 2018 6:03 PM
> > To: Whaley, Graham <graham.whaley at intel.com>; openstack-
> > infra at lists.openstack.org; thierry at openstack.org
> > Cc: Ernst, Eric <eric.ernst at intel.com>; fungi at yuggoth.org
> > Subject: Re: Adding index and views/dashboards for Kata to ELK stack
> [snip]
> > > I don't think the Zuul Ansible role will be applicable - the metrics run
> > > on bare metal machines running Jenkins, and export their JSON results
> > > via a filebeat socket. My theory was we'd then add the socket input to
> > > the logstash server to receive from that filebeat - as in my gist at
> > >
> > https://gist.github.com/grahamwhaley/aa730e6bbd6a8ceab82129042b186467
> > 
> > I don't think we would want to expose write access to the unauthenticated
> > logstash and elasticsearch system to external systems. The thing that makes this
> > secure today is we (community infrastructure team) control the existing writers.
> > The existing writers are available for your use (see below) should you decide to
> > use them.
> 
> My theory was we'd secure the connection at least using the logstash/
> beat SSL connection, and only we/the infra group would have access to 
> the keys:
> https://www.elastic.co/guide/en/beats/filebeat/current/configuring-ssl-logstash.html
> 
> The machines themselves are only accessible by the CNCF CIL owners and 
> nominated Kata engineers with the keys.
> > 
> > >
> > > One crux here is that the metrics have to run on a machine with
> > > guaranteed performance (so not a shared/virtual cloud instance), and
> > > hence currently run under Jenkins and not on the OSF/Zuul CI infra.
> > 
> > Zuul (by way of Nodepool) can speak to arbitrary machines as long as they speak
> > an ansible connection protocol. In this case the default of ssh would probably
> > work when tied to nodepool's static instance driver. The community
> > infrastructure happens to only talk to cloud VMs today because that is what we
> > have been given access to, but should be able to talk to other resources if
> > people show up with them.
> 
> If we ignore the fact that all current Kata CI is running on Jenkins, 
> and we are not presently transitioning to Zuul afaik, then....
> Even if we did integrate the bare metal CNCF CIL packet.net machines vi 
> ansible/SSH/nodepool/Zuul, then afaict you'd still be running the same 
> CI tasks on the same machines and injecting the Elastic data through the 
> same SSL socket/tunnel into Elastic.

No, we would inject the data through the existing test node -> Zuul -> Logstash -> Elasticsearch path.

> I know you'd like to keep as much of the infra under your control, but 
> the only bit I think that would be different is the Jenkins Master. 
> Given the Jenkins job running the slave only executes master branch 
> merges, which have undergone peer review (which would be the same jobs 
> that Zuul would run), then I'm not sure there is any security difference 
> here in reality between having the Kata Jenkins master or Zuul drive the 
> slaves.

There is more to it than that. This service is part of the CI system we operate. The way you consume it is through the use of Zuul jobs. If you want to inject data into our Logstash/Elasticsearch system you do that by configuring your jobs in Zuul to do so. We are not in the business of operating one off solutions to problems. We support a large variety of users and projects and using generic flexible systems like this one is how we make that viable.

Additionally these systems are community managed so that we can work together to solve these problems in a way that gives the infra team appropriate administrative access while still allowing you and others to get specific work done. Rather than avoid this tooling can we please attempt to use it when it has preexisting solutions to problems like this? We will happily do our best to make re-consumption of existing systems a success, but building one off solutions to solve problems that are already solved does not scale.

> 
> > 
> > >
> > > Let me know you see any issues with that Jenkins/filebeat/socket/JSON flow.
> > >
> > > I need to deploy a new machine to process master branch merges to
> > > generate the data (currently we have a machine that is processing PRs at
> > > submission, not merge, which is not the data we want to track long
> > > term). I'll let you know when I have that up and running. If we wanted
> > > to move on this earlier, then I could inject data to a test index from
> > > my local test setup - all it would need I believe is the valid keys for
> > > the filebeat->logstash connection.
> 
> Oh, I've deployed a Jenkins slave and job to test out the first stage of 
> the flow btw:
> http://jenkins.katacontainers.io/job/kata-metrics-runtime-ubuntu-16-04-master/
> 
> > >
> > > > Clark
> > > Thanks!
> > >   Graham (now on copy ;-)
> > 
> > Ideally we'd make use of the existing community infrastructure as much as
> > possible to make this sustainable and secure. We are happy to modify our
> > existing tooling as necessary to do this. Update the logstash configuration, add
> > Nodepool resources, have grafana talk to elasticsearch, and so on.
> 
> I think the only key decision is if we can use the packet.net slaves as 
> driven by the kata Jenkins master, or if we have to move the management 
> of those into Zuul.
> For expediency and consistency with the rest of the Kata CI, obviously I 
> lean heavily towards Jenkins.
> If we do have to go with Zuul, then I think we'll have to work out who 
> has access to and how they can modify the Zuul job configs for Kata.

I wasn't directly involved with the decision making at the time but back at the beginning of the year my understanding was that Jenkins was chosen over Zuul for expediency. This wasn't a bad choice as the Github support in Zuul was still quite new (though having more users would likely have pushed it along more quickly). It probably would be worthwhile to decide separately if Jenkins is the permanent solution to the Kata CI tooling problem, or if we should continue to push for Zuul. If we want to push for Zuul then I think we need to stop choosing Jenkins as a default and start implementing new stuff in Zuul then move the existing CI as Kata is able.

As for who has Zuul access, the Infra team has administrative access to the service. Zuul configuration for the existing Kata jobs is done through a repo managed by the infra team, but anyone can push and propose changes to this repo. The reason for this is Zuul wants to gate its config updates to prevent new configs from being merged without being tested. Bypassing this testing does allow you to break your Zuul configuration. Currently we aren't gating Kata with Zuul so the configs live in the Infra repo. If we started gating Kata changes with Zuul we could move the configs into Kata repos and Kata could self manage them.

Looking ahead Zuul is multitenant aware, and we could deploy a Kata tenant. This would give Kata a bit more freedom to configure its Zuul pipeline behavior as desired, though gating is still strongly recommended as that will prevent broken configs from merging.

> 
> (adding Salvador to CC, as he is the Kata Jenkins owner mostly, and has 
> also worked on the Zuul PoC for Kata before).
> 
>  Graham (hoping we can come to some agreement :-) )



More information about the OpenStack-Infra mailing list