[openstack-dev] Which program for Rally

Boris Pavlovic boris at pavlovic.me
Thu Aug 14 21:57:29 UTC 2014


One thing did just occur to me while writing this though it's probably worth
> investigating splitting out the stress test framework as an external
> tool/project after we start work on the tempest library. [3]

I fully agree with the fact that stress testing doesn't belong to Tempest.

This current thread is all about this aspect and all my arguments, related
to splitting Rally and merging it to tempest are related to this.

Could you please elaborate, why instead of ready solution "Rally"  that has
community and is aligned with OpenStack and OpenStack processes you are
going to create from scratch similar solution?

I really don't see any reasons why we need to duplicate existing and
working solution and can't just work together on Rally?

Best regards,
Boris Pavlovic

On Fri, Aug 15, 2014 at 1:15 AM, Matthew Treinish <mtreinish at kortar.org>

> On Wed, Aug 13, 2014 at 03:48:59PM -0600, Duncan Thomas wrote:
> > On 13 August 2014 13:57, Matthew Treinish <mtreinish at kortar.org> wrote:
> > > On Tue, Aug 12, 2014 at 01:45:17AM +0400, Boris Pavlovic wrote:
> > >> Keystone, Glance, Cinder, Neutron and Heat are running rally
> performance
> > >> jobs, that can be used for performance testing, benchmarking,
> regression
> > >> testing (already now). These jobs supports in-tree plugins for all
> > >> components (scenarios, load generators, benchmark context) and they
> can use
> > >> Rally fully without interaction with Rally team at all. More about
> these
> > >> jobs:
> > >>
> https://docs.google.com/a/mirantis.com/document/d/1s93IBuyx24dM3SmPcboBp7N47RQedT8u4AJPgOHp9-A/
> > >> So I really don't see anything like this in tempest (even in observed
> > >> future)
> >
> > > So this is actually the communication problem I mentioned before.
> Singling out
> > > individual projects and getting them to add a rally job is not "cross
> project"
> > > communication. (this is part of what I meant by "push using Rally")
> There was no
> > > larger discussion on the ML or a topic in the project meeting about
> adding these
> > > jobs. There was no discussion about the value vs risk of adding new
> jobs to the
> > > gate. Also, this is why less than half of the integrated projects have
> these
> > > jobs. Having asymmetry like this between gating workloads on projects
> helps no
> > > one.
> >
> > So the advantage of the approach, rather than having a massive
> > cross-product discussion, is that interested projects (I've been very
> > interested for a cinder core PoV) act as a test bed for other
> > projects. 'Cross project' discussions rather come to other teams, they
> > rely on people to find them, where as Boris came to us, said I've got
> > this thing you might like, try it out, tell me what you want. He took
> > feedback, iterated fast and investigated bugs. It has been a genuine
> > pleasure to work with him, and I feel we made progress faster than we
> > would have done if it was trying to please everybody.
> I'm not arguing whether Boris was great to work with or not. Or whether
> there
> isn't value in talking directly to the dev team when setting up a new job.
> That
> is definitely the fastest path to getting a new job up and running. But,
> for
> something like adding a new class of dsvm job which runs on every patch,
> that
> affects everyone, not just the project where the jobs are being added. A
> larger
> discussion is really necessary to weigh whether such a job should be
> added. It
> really only needs to happen once, just before the first one is added on an
> integrated project.
> >
> > > That being said the reason I think osprofiler has been more accepted
> and it's
> > > adoption into oslo is not nearly as contentious is because it's an
> independent
> > > library that has value outside of itself. You don't need to pull in a
> monolithic
> > > stack to use it. Which is a design point more conducive with the rest
> of
> > > OpenStack.
> >
> > Sorry, are you suggesting tempest isn't a giant monolithic thing?
> > Because I was able to comprehend the rally code very quickly, that
> > isn't even slightly true of tempest. Having one simple tool that does
> > one thing well is exactly what rally has tried to do - tempest seems
> > to want to be five different things at once (CI, instalation tests,
> > trademark, preformance, stress testing, ...)
> This is actually a common misconception about the purpose and role of
> Tempest.
> Tempest is strictly concerned with being the integration test suite for
> OpenStack, which just includes the actual tests and some methods of
> running the
> tests. This is attempted to be done in a manner which is independent of the
> environment in which tempest is run or run against. (for example, devstack
> vs a
> public cloud) Yes tempest is a large project and has a lot of tests which
> just
> adds to it's complexity, but it's scope is quite targeted. It's just that
> it
> grows at the same rate OpenStack scope grows because tempest has coverage
> for
> all the projects.
> Methods of running the tests does include the stress tests framework, but
> that
> is mostly just a method of leveraging the large quantity of tests we
> currently
> have in-tree to generate load. [1] (Yeah, we need to write better user docs
> around this and a lot of other things) It just lets you define which tests
> to
> use and how to loop and distribute them over workers. [2]
> The trademark, CI, upgrade testing, and installation testing are just
> examples
> of applications where tempest is being used. (some of which are the domain
> of
> other QA or Infra program projects, some are not) If you look in the
> tempest
> tree you'll see very little specifically about any of those applications.
> They're all mostly accomplished by building tooling around tempest. For
> example:
> refstack->trademark, devstack-gate->ci, grenade->upgrade, etc. Tempest is
> just a
> building block that can be used to make all of those things. As all of
> these
> different use cases are basically tempest's primary consumer we do have to
> take them into account when working on improving tempest to try and not
> upset
> our "users". But, they are not explicit goals of the tempest project by
> itself.
> One thing did just occur to me while writing this though it's probably
> worth
> investigating splitting out the stress test framework as an external
> tool/project after we start work on the tempest library. [3]
> <snip>
> >>
> >> So firstly, I want to say I find these jobs troubling. Not just from
> the fact
> >> that because of the nature of the gate (2nd level virt on public
> clouds) the
> >> variability between jobs can be staggering. I can't imagine what value
> there is
> >> in running synthetic benchmarks in this environment. It would only
> reliably
> >> catch the most egregious of regressions. Also from what I can tell none
> of these
> >> jobs actually compare the timing data to the previous results, it just
> generates
> >> the data and makes a pretty graph. The burden appears to be on the user
> to
> >> figure out what it means, which really isn't that useful. How have
> these jobs
> >> actually helped? IMO the real value in performance testing in the gate
> is to
> >> capture the longer term trends in the data. Which is something these
> jobs are
> >> not doing.
> >
> >
> > So I put in a change to dump out the raw data from each run into a
> > zipped json file so that I can start looking at the value of
> > collecting this data.... As an experiment I think it is very worth
> > while. The gate job is none voting, and apparently, at least on the
> > cinder front, highly reliable. The job runs fast enough it isn't
> > slowing the gate down - we aren't running out of nodes on the gate as
> > far as I can tell, so I don't understand the hostility towards it.
> > We'll run it for a bit, see if it proves useful, if it doesn't then we
> > can turn it off and try something else.
> >
> > I'm confused by the hostility about this gate job - it is costing us
> > nothing, if it turns out to be a pain we'll turn it off.
> So aside from longer term issues that have already been brought up, as a
> short
> term experiment I agree it's probably fine. However, the argument that it
> costs
> us nothing isn't exactly the case. During the course of a day we routinely
> exhaust the max quota on nodes. Adding additional dsvm jobs just
> contributes to
> this exhaustion. While this probably isn't very noticeable when the gate
> is in a
> healthy state. (it just makes things a bit slower) However, when things
> aren't
> working this becomes a huge problem, because node churn slows down our
> ability
> to land fixes which just makes recovery slower. Given that we have a finite
> number of gate resources I don't really think we should be running
> something
> that's considered an experiment on every patch. Instead it should be in the
> experimental pipeline for the project so it can be run on-demand, but
> doesn't
> constantly eat resources.
> >
> > Rally as a general tool has enabled me do do things that I wouldn't
> > even consider trying with tempest. There shouldn't be a problem with a
> > small number of parallel efforts - that's a founding principle of
> > opensource in general.
> So, I agree there is nothing wrong with doing things as a separate tool in
> a
> parallel effort, which was Thierry's option #3 on the original post. The
> question up for debate is whether we should adopt Rally into the QA
> program or
> create a new program for it. My opinion on the matter, for the reasons I've
> already outlined in my other posts, is no, not in it's current form. I'd
> really
> like to see collaboration here working towards eventual integration of
> Rally
> into the QA program. But, I don't think there is anything wrong with the
> rally
> dev team deciding to continue as they are outside of an OpenStack program.
> -Matt Treinish
> [1]
> http://docs.openstack.org/developer/tempest/field_guide/stress.html#stress-field-guide
> [2]
> http://git.openstack.org/cgit/openstack/tempest/tree/tempest/stress/etc/sample-unit-test.json
> [3]
> http://specs.openstack.org/openstack/qa-specs/specs/tempest-library.html
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140815/bc183503/attachment-0001.html>

More information about the OpenStack-dev mailing list