[openstack-dev] [tripleo][ci] decreased coverage for telemetry

Wesley Hayutin whayutin at redhat.com
Wed Jul 12 14:46:19 UTC 2017


On Wed, Jul 12, 2017 at 10:33 AM, Pradeep Kilambi <prad at redhat.com> wrote:

> On Tue, Jul 11, 2017 at 10:06 PM, Wesley Hayutin <whayutin at redhat.com>
> wrote:
> >
> >
> > On Tue, Jul 11, 2017 at 9:04 PM, Emilien Macchi <emilien at redhat.com>
> wrote:
> >>
> >> On Tue, Jul 11, 2017 at 12:41 PM, Pradeep Kilambi <prad at redhat.com>
> wrote:
> >> > On Tue, Jul 11, 2017 at 3:17 PM, Wesley Hayutin <whayutin at redhat.com>
> >> > wrote:
> >> >> Greetings,
> >> >>
> >> >> I was looking through the mailing list and I did not see any emails
> >> >> explicitly calling out the decreased coverage for telemetry in
> tripleo
> >> >> due
> >> >> to [1].  A series of changes went into the CI system to disable
> >> >> telemetry
> >> >> [2].
> >> >>
> >> >> There is work being done to restore more coverage for telemetry by
> >> >> limiting
> >> >> the resources it consumes [3].  We are also working on additional
> >> >> scenarios
> >> >> in t-h-t/ci/environments/ to better cover ceilometer.
> >> >>
> >> >> If the CI environment you are working in has the resources to cover
> >> >> ceilometer that is great, however if you find issues like [1] we
> highly
> >> >> suggest you follow the same pattern until coverage is restored
> >> >> upstream.
> >> >>
> >> >> Thank you!
> >> >>
> >> >> [1] https://bugs.launchpad.net/tripleo/+bug/1693174
> >> >> [2] https://review.openstack.org/#/q/topic:bug/1680195
> >> >> [3]
> >> >> https://review.openstack.org/#/c/475838/
> >> >> https://review.openstack.org/#/c/474969/
> >> >> https://review.openstack.org/#/c/476666/
> >> >>
> >> >>
> >> >
> >> > Thanks for starting this thread Wes. I concur with this. We got bitten
> >> > recently by many issues that we could have caught in ci had telemetry
> >> > been enabled. I spoke to trown and Emilien about this a few times
> >> > already. I do understand the resource footprint it causes.  But with
> >> > recent improvements and changes upstream, things should be back to
> >> > being more manageable. We do have telemetry tested in scenario001 job,
> >> > but that doesn't cover all scenarios. So there is a gap in coverage.
> >>
> >> What do you mean by gap in coverage?
> >> We have scenarios on purpose, so we can horizontally scale the
> >> coverage across multiple jobs and run the jobs only when we need (e.g.
> >> touching telemetry files for scenario001).
> >>
> >> Please elaborate on what isn't covered by scenario001, because we
> >> already cover Gnocchi, Panko, Aodh and Ceilometer (with RBD backend
> >> and soon with Swift backend in scenario002).
> >>
> >
> > Emilien,
> > Gap is the wrong word to use in the case.
> > Previously we had several jobs running with telemetry turned on including
> > ovb jobs in tripleo and other jobs outside of the upstream CI system.
> > The more jobs running, the more coverage.
> > I think that is what Pradeep was referring to, but maybe I am
> > misunderstanding this as well.
>
> Yea may be gap is not the right word. But mostly i meant what Wes
> said, but also I feel we are not testing Telemetry with full HA
> currently in CI. scenario jobs only test deploy with 1 controller not
> 3. We have seen some recent issues where things work on controller 0
> but controller 1 or 2 has statsd down for example. The ovb ha job
> would have shown us that, had the ovb ha job included telemetry
> enabled. Is it possible to run scenario001 job with full HA ?
>

Full HA is limited to ovb jobs atm and these jobs currently take longer to
run and are barely able to complete in the mandatory upstream timeout
period.
IMHO it's worth the time and effort to see if the performance improvements
currently being made to ceilometer will work properly with the OVB jobs,
but nothing I can guarantee atm.

Work is now starting on being able to deploy a full HA envrionment using
nodepool multinode jobs.  IMHO this is a better target.
I will keep you posted on the progress here.

Thank you Pradeep.


>
>
>
> >
> >
> >>
> >> >  I hope we can either re-enable these services by default in CI and
> >> > how things work or at least add a separate gate job to be able to test
> >> > HA scenario properly with telemetry enabled.
> >> >
> >> > --
> >> > Cheers,
> >> > ~ Prad
> >> >
> >> >
> >> > ____________________________________________________________
> ______________
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe:
> >> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >>
> >>
> >> --
> >> Emilien Macchi
> >
> >
> >
> > ____________________________________________________________
> ______________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
>
> --
> Cheers,
> ~ Prad
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170712/499d6b94/attachment.html>


More information about the OpenStack-dev mailing list