[openstack-dev] [QA][gate][all] dsvm gate stability and scenario tests

Jordan Pittier jordan.pittier at scality.com
Fri Mar 17 12:27:23 UTC 2017


The patch that reduced the number of Tempest Scenarios we run in every job
and also reduce the test run concurrency [0] was merged 13 days ago. Since,
the situation (i.e the high number of false negative job results) has not
improved significantly. We need to keep looking collectively at this.

There seems to be an agreement that we are hitting some memory limit.
Several of our most frequent failures are memory related [1]. So we should
either reduce our memory usage or ask for bigger VMs, with more than 8GB of
RAM.

There was/is several attempts to reduce our memory usage, by reducing the
Mysql memory consumption ([2] but quickly reverted [3]), reducing the
number of Apache workers ([4], [5]), more apache2 tuning [6]. If you have
any crazy idea to help in this regard, please help. This is high priority
for the whole openstack project, because it's plaguing many projects.

We have some tools to investigate memory consumption, like some regular
"dstat" output [7], a home-made memory tracker [8] and stackviz [9].

Best,
Jordan

[0]: https://review.openstack.org/#/c/439698/
[1]: http://status.openstack.org/elastic-recheck/gate.html
[2] : https://review.openstack.org/#/c/438668/
[3]: https://review.openstack.org/#/c/446196/
[4]: https://review.openstack.org/#/c/426264/
[5]: https://review.openstack.org/#/c/445910/
[6]: https://review.openstack.org/#/c/446741/
[7]:
http://logs.openstack.org/96/446196/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/b5c362f/logs/dstat-csv_log.txt.gz
[8]:
http://logs.openstack.org/96/446196/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/b5c362f/logs/screen-peakmem_tracker.txt.gz
[9] :
http://logs.openstack.org/41/446741/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/fa4d2e6/logs/stackviz/#/stdin/timeline

On Sat, Mar 4, 2017 at 4:19 PM, Andrea Frittoli <andrea.frittoli at gmail.com>
wrote:

> Quick update on this, the change is now merged, so we now have a smaller
> number of scenario tests running serially after the api test run.
>
> We'll monitor gate stability for the next week or so and decide whether
> further actions are required.
>
> Please keep categorizing failures via elastic recheck as usual.
>
> thank you
>
> andrea
>
> On Fri, 3 Mar 2017, 8:02 a.m. Ghanshyam Mann, <ghanshyammann at gmail.com>
> wrote:
>
>> Thanks. +1. i added my list in ethercalc.
>>
>> Left put scenario tests can be run on periodic and experimental job. IMO
>> on both ( periodic and experimental) to monitor their status periodically
>> as well as on particular patch if we need to.
>>
>> -gmann
>>
>> On Fri, Mar 3, 2017 at 4:28 PM, Andrea Frittoli <
>> andrea.frittoli at gmail.com> wrote:
>>
>> Hello folks,
>>
>> we discussed a lot since the PTG about issues with gate stability; we
>> need a stable and reliable gate to ensure smooth progress in Pike.
>>
>> One of the issues that stands out is that most of the times during test
>> runs our test VMs are under heavy load.
>> This can be the common cause behind several failures we've seen in the
>> gate, so we agreed during the QA meeting yesterday [0] that we're going to
>> try reducing the load and see whether that improves stability.
>>
>> Next steps are:
>> - select a subset of scenario tests to be executed in the gate, based on
>> [1], and run them serially only
>> - the patch for this is [2] and we will approve this by the end of the day
>> - we will monitor stability for a week - if needed we may reduce
>> concurrency a bit on API tests as well, and identify "heavy" tests
>> candidate for removal / refactor
>> - the QA team won't approve any new test (scenario or heavy resource
>> consuming api) until gate stability is ensured
>>
>> Thanks for your patience and collaboration!
>>
>> Andrea
>>
>> ---
>> irc: andreaf
>>
>> [0] http://eavesdrop.openstack.org/meetings/qa/
>> 2017/qa.2017-03-02-17.00.txt
>> [1] https://ethercalc.openstack.org/nu56u2wrfb2b
>> [2] https://review.openstack.org/#/c/439698/
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170317/d67271f7/attachment.html>


More information about the OpenStack-dev mailing list