Open Stack

Thu Aug 14 02:28:48 UTC 2014

On Wed, Aug 13, 2014 at 2:48 PM, Duncan Thomas <duncan.thomas at gmail.com>
wrote:

> On 13 August 2014 13:57, Matthew Treinish <mtreinish at kortar.org> wrote:
> > On Tue, Aug 12, 2014 at 01:45:17AM +0400, Boris Pavlovic wrote:
> >> Keystone, Glance, Cinder, Neutron and Heat are running rally performance
> >> jobs, that can be used for performance testing, benchmarking, regression
> >> testing (already now). These jobs supports in-tree plugins for all
> >> components (scenarios, load generators, benchmark context) and they can
> use
> >> Rally fully without interaction with Rally team at all. More about these
> >> jobs:
> >>
> https://docs.google.com/a/mirantis.com/document/d/1s93IBuyx24dM3SmPcboBp7N47RQedT8u4AJPgOHp9-A/
> >> So I really don't see anything like this in tempest (even in observed
> >> future)
>
> > So this is actually the communication problem I mentioned before.
> Singling out
> > individual projects and getting them to add a rally job is not "cross
> project"
> > communication. (this is part of what I meant by "push using Rally")
> There was no
> > larger discussion on the ML or a topic in the project meeting about
> adding these
> > jobs. There was no discussion about the value vs risk of adding new jobs
> to the
> > gate. Also, this is why less than half of the integrated projects have
> these
> > jobs. Having asymmetry like this between gating workloads on projects
> helps no
> > one.
>
> So the advantage of the approach, rather than having a massive
> cross-product discussion, is that interested projects (I've been very
> interested for a cinder core PoV) act as a test bed for other
> projects. 'Cross project' discussions rather come to other teams, they
> rely on people to find them, where as Boris came to us, said I've got
> this thing you might like, try it out, tell me what you want. He took
> feedback, iterated fast and investigated bugs. It has been a genuine
> pleasure to work with him, and I feel we made progress faster than we
> would have done if it was trying to please everybody.
>
> > That being said the reason I think osprofiler has been more accepted and
> it's
> > adoption into oslo is not nearly as contentious is because it's an
> independent
> > library that has value outside of itself. You don't need to pull in a
> monolithic
> > stack to use it. Which is a design point more conducive with the rest of
> > OpenStack.
>
> Sorry, are you suggesting tempest isn't a giant monolithic thing?
> Because I was able to comprehend the rally code very quickly, that
> isn't even slightly true of tempest. Having one simple tool that does
> one thing well is exactly what rally has tried to do - tempest seems
> to want to be five different things at once (CI, instalation tests,
> trademark, preformance, stress testing, ...)
>
> >> Matt, Sean - seriously community is about convincing people, not about
> >> forcing people to do something against their wiliness.  You are making
> huge
> >> architectural decisions without deep knowledge about what is Rally, what
> >> are use cases, road map, goals and auditory.
> >>
> >> IMHO community in my opinion is thing about convincing people. So QA
> >> program should convince Rally team (at least me) to do such changes. Key
> >> secret to convince me, is to say how this will help OpenStack to perform
> >> better.
> >
> > If community, per your definition, is about convincing people then there
> needs
> > to be a 2-way discussion. This is an especially key point considering the
> > feedback on this thread is basically the same feedback you've been
> getting since
> > you first announced Rally on the ML. [1] (and from even before that I
> think, but
> > it's hard to remember all the details from that far back)  I'm afraid
> that
> > without a shared willingness to explore what we're suggesting because of
> > preconceived notions then I fail to see the point in moving forward. The
> fact
> > that this feedback has been ignored is why this discussion has come up
> at all.
> >
> >>
> >> Currently Rally team see a lot of issues related to this decision:
> >>
> >> 1) It breaks already existing performance jobs (Heat, Glance, Cinder,
> >> Neutron, Keystone)
> >
> > So firstly, I want to say I find these jobs troubling. Not just from the
> fact
> > that because of the nature of the gate (2nd level virt on public clouds)
> the
> > variability between jobs can be staggering. I can't imagine what value
> there is
> > in running synthetic benchmarks in this environment. It would only
> reliably
> > catch the most egregious of regressions. Also from what I can tell none
> of these
> > jobs actually compare the timing data to the previous results, it just
> generates
> > the data and makes a pretty graph. The burden appears to be on the user
> to
> > figure out what it means, which really isn't that useful. How have these
> jobs
> > actually helped? IMO the real value in performance testing in the gate
> is to
> > capture the longer term trends in the data. Which is something these
> jobs are
> > not doing.
>
> So I put in a change to dump out the raw data from each run into a
> zipped json file so that I can start looking at the value of
> collecting this data.... As an experiment I think it is very worth
> while. The gate job is none voting, and apparently, at least on the
> cinder front, highly reliable. The job runs fast enough it isn't
> slowing the gate down - we aren't running out of nodes on the gate as
> far as I can tell, so I don't understand the hostility towards it.
> We'll run it for a bit, see if it proves useful, if it doesn't then we
> can turn it off and try something else.
>

We actually run out of nodes almost every day now (except weekends), we
have about 800 nodes, and hit that quota most days [0][1].

While the output of the rally job [2] is very impressive, with our
constrained number of nodes, I am still struggling to grok the value of
running this job on every patch.

[0]
http://graphite.openstack.org/render/?from=-24hours&height=180&until=now&width=334&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color(alias(sumSeries(stats.gauges.nodepool.target.building),%20%27Building%27),%20%27ffbf52%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.ready),%20%27Available%27),%20%2700c868%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.used),%20%27In%20Use%27),%20%276464ff%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.delete),%20%27Deleting%27),%20%27c864ff%27)&title=Test%20Nodes&_t=0.8509290898218751#1407982412165

[1]
http://graphite.openstack.org/render/?from=-12days&height=180&until=now&width=334&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color(alias(sumSeries(stats.gauges.nodepool.target.building),%20%27Building%27),%20%27ffbf52%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.ready),%20%27Available%27),%20%2700c868%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.used),%20%27In%20Use%27),%20%276464ff%27)&target=color(alias(sumSeries(stats.gauges.nodepool.target.delete),%20%27Deleting%27),%20%27c864ff%27)&title=Test%20Nodes&_t=0.8509290898218751#1407982412165

[2]
http://logs.openstack.org/02/109202/4/check/gate-rally-dsvm-cinder/bbc256b/rally-plot/results.html.gz

> I'm confused by the hostility about this gate job - it is costing us
> nothing, if it turns out to be a pain we'll turn it off.
>
> Rally as a general tool has enabled me do do things that I wouldn't
> even consider trying with tempest. There shouldn't be a problem with a
> small number of parallel efforts - that's a founding principle of
> opensource in general.
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140813/4180a262/attachment.html>

Open Stack

[openstack-dev] Which program for Rally

OpenStack

Community

Documentation

Branding & Legal