[openstack-dev] [QA][gate][all] dsvm gate stability and scenario tests

Ihar Hrachyshka ihrachys at redhat.com
Fri Mar 17 12:45:30 UTC 2017


I had some patches to collect more stats about mlocks here:
https://review.openstack.org/#/q/topic:collect-mlock-stats-in-gate but they
need reviews.

Ihar

On Fri, Mar 17, 2017 at 5:28 AM Jordan Pittier <jordan.pittier at scality.com>
wrote:

> The patch that reduced the number of Tempest Scenarios we run in every job
> and also reduce the test run concurrency [0] was merged 13 days ago. Since,
> the situation (i.e the high number of false negative job results) has not
> improved significantly. We need to keep looking collectively at this.
>
> There seems to be an agreement that we are hitting some memory limit.
> Several of our most frequent failures are memory related [1]. So we should
> either reduce our memory usage or ask for bigger VMs, with more than 8GB of
> RAM.
>
> There was/is several attempts to reduce our memory usage, by reducing the
> Mysql memory consumption ([2] but quickly reverted [3]), reducing the
> number of Apache workers ([4], [5]), more apache2 tuning [6]. If you have
> any crazy idea to help in this regard, please help. This is high priority
> for the whole openstack project, because it's plaguing many projects.
>
> We have some tools to investigate memory consumption, like some regular
> "dstat" output [7], a home-made memory tracker [8] and stackviz [9].
>
> Best,
> Jordan
>
> [0]: https://review.openstack.org/#/c/439698/
> [1]: http://status.openstack.org/elastic-recheck/gate.html
> [2] : https://review.openstack.org/#/c/438668/
> [3]: https://review.openstack.org/#/c/446196/
> [4]: https://review.openstack.org/#/c/426264/
> [5]: https://review.openstack.org/#/c/445910/
> [6]: https://review.openstack.org/#/c/446741/
> [7]:
> http://logs.openstack.org/96/446196/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/b5c362f/logs/dstat-csv_log.txt.gz
> [8]:
> http://logs.openstack.org/96/446196/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/b5c362f/logs/screen-peakmem_tracker.txt.gz
> [9] :
> http://logs.openstack.org/41/446741/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/fa4d2e6/logs/stackviz/#/stdin/timeline
>
> On Sat, Mar 4, 2017 at 4:19 PM, Andrea Frittoli <andrea.frittoli at gmail.com
> > wrote:
>
> Quick update on this, the change is now merged, so we now have a smaller
> number of scenario tests running serially after the api test run.
>
> We'll monitor gate stability for the next week or so and decide whether
> further actions are required.
>
> Please keep categorizing failures via elastic recheck as usual.
>
> thank you
>
> andrea
>
> On Fri, 3 Mar 2017, 8:02 a.m. Ghanshyam Mann, <ghanshyammann at gmail.com>
> wrote:
>
> Thanks. +1. i added my list in ethercalc.
>
> Left put scenario tests can be run on periodic and experimental job. IMO
> on both ( periodic and experimental) to monitor their status periodically
> as well as on particular patch if we need to.
>
> -gmann
>
> On Fri, Mar 3, 2017 at 4:28 PM, Andrea Frittoli <andrea.frittoli at gmail.com
> > wrote:
>
> Hello folks,
>
> we discussed a lot since the PTG about issues with gate stability; we need
> a stable and reliable gate to ensure smooth progress in Pike.
>
> One of the issues that stands out is that most of the times during test
> runs our test VMs are under heavy load.
> This can be the common cause behind several failures we've seen in the
> gate, so we agreed during the QA meeting yesterday [0] that we're going to
> try reducing the load and see whether that improves stability.
>
> Next steps are:
> - select a subset of scenario tests to be executed in the gate, based on
> [1], and run them serially only
> - the patch for this is [2] and we will approve this by the end of the day
> - we will monitor stability for a week - if needed we may reduce
> concurrency a bit on API tests as well, and identify "heavy" tests
> candidate for removal / refactor
> - the QA team won't approve any new test (scenario or heavy resource
> consuming api) until gate stability is ensured
>
> Thanks for your patience and collaboration!
>
> Andrea
>
> ---
> irc: andreaf
>
> [0]
> http://eavesdrop.openstack.org/meetings/qa/2017/qa.2017-03-02-17.00.txt
> [1] https://ethercalc.openstack.org/nu56u2wrfb2b
> [2] https://review.openstack.org/#/c/439698/
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170317/b4cd37d9/attachment.html>


More information about the OpenStack-dev mailing list