Open Stack

Fri Mar 17 22:49:55 UTC 2017

On Fri, Mar 17, 2017, at 08:23 AM, Jordan Pittier wrote:
> On Fri, Mar 17, 2017 at 3:11 PM, Sean Dague <sean at dague.net> wrote:
> 
> > On 03/17/2017 09:24 AM, Jordan Pittier wrote:
> > >
> > >
> > > On Fri, Mar 17, 2017 at 1:58 PM, Sean Dague <sean at dague.net
> > > <mailto:sean at dague.net>> wrote:
> > >
> > >     On 03/17/2017 08:27 AM, Jordan Pittier wrote:
> > >     > The patch that reduced the number of Tempest Scenarios we run in
> > every
> > >     > job and also reduce the test run concurrency [0] was merged 13
> > days ago.
> > >     > Since, the situation (i.e the high number of false negative job
> > results)
> > >     > has not improved significantly. We need to keep looking
> > collectively at
> > >     > this.
> > >
> > >     While the situation hasn't completely cleared out -
> > >     http://tinyurl.com/mdmdxlk - since we've merged this we've not seen
> > that
> > >     job go over 25% failure rate in the gate, which it was regularly
> > >     crossing in the prior 2 week period. That does feel like progress. In
> > >     spot checking I we are also rarely failing in scenario tests now, but
> > >     the fails tend to end up inside heavy API tests running in parallel.
> > >
> > >
> > >     > There seems to be an agreement that we are hitting some memory
> > limit.
> > >     > Several of our most frequent failures are memory related [1]. So we
> > >     > should either reduce our memory usage or ask for bigger VMs, with
> > more
> > >     > than 8GB of RAM.
> > >     >
> > >     > There was/is several attempts to reduce our memory usage, by
> > reducing
> > >     > the Mysql memory consumption ([2] but quickly reverted [3]),
> > reducing
> > >     > the number of Apache workers ([4], [5]), more apache2 tuning [6].
> > If you
> > >     > have any crazy idea to help in this regard, please help. This is
> > high
> > >     > priority for the whole openstack project, because it's plaguing
> > many
> > >     > projects.
> > >
> > >     Interesting, I hadn't seen the revert. It is also curious that it was
> > >     largely limitted to the neutron-api test job. It's also notable that
> > the
> > >     sort buffers seem to have been set to the minimum allowed limit of
> > mysql
> > >     -
> > >     https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.
> > html#sysvar_innodb_sort_buffer_size
> > >     <https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.
> > html#sysvar_innodb_sort_buffer_size>
> > >     - and is over an order of magnitude decrease from the existing
> > default.
> > >
> > >     I wonder about redoing the change with everything except it and
> > seeing
> > >     how that impacts the neutron-api job.
> > >
> > > Yes, that would be great because mysql is by far our biggest memory
> > > consumer so we should target this first.
> >
> > While it is the single biggest process, weighing in at 500 MB, the
> > python services are really our biggest memory consumers. They are
> > collectively far outweighing either mysql or rabbit, and are the reason
> > that even with 64MB guests we're running out of memory. So we want to
> > keep that under perspective.
> >
> Absolutely. I have https://review.openstack.org/#/c/446986/ in that vain.
> And if someone wants to start the work of not running the several Swift
> *auditor*, *updater*, *reaper*, *replicator* services, in case the Swift
> Replication factor is set to 1, that's also a good memory saving.

I've currently got https://review.openstack.org/#/c/447119/ up to enable
kernel samepage merging if devstack is configured to use libvirt. I've
got details in change comments but I think it may save about 150MB in
peak memory use.

I've also got https://review.openstack.org/#/c/446741/ to tune back
apache's worker and connection counts with details on savings also on
the change. This is much smaller but its a simple change so probably
worthwhile.

Feedback very welcome.

With that said I agree we need individuals familiar with specific
services focusing on trimming those back too. The python services are
the biggest memory users here. We are only going to be able to disable
so many services before we need to modify the services we actually need
running.

Clark

Open Stack

[openstack-dev] [QA][gate][all] dsvm gate stability and scenario tests

OpenStack

Community

Documentation

Branding & Legal