[openstack-dev] [infra] [gate] [all] openstack services footprint lead to oom-kill in the gate
ihrachys at redhat.com
Wed Feb 15 20:38:46 UTC 2017
Another potentially relevant info is, we saw before that oom-killer is
triggered while 8gb of swap are barely used. This behavior is hard to
explain, since we set kernel swappiness sysctl knob to 30:
(and any value above 0 means that if memory is requested, and there is
swap available to fulfill it, it will not fail to allocate memory;
swappiness only controls willingness of kernel to swap process pages
instead of dropping disk cache entries, it may affect performance, but
it should not affect malloc behavior).
The only reason I can think of for a memory allocation request to
trigger the trap when swap is free is when the memory request is for a
RAM-locked page (it can either be memory locked with mlock(2), or
mmap(2) when MAP_LOCKED used). To understand if that's the case in
gate, I am adding a new mlock_tracker service to devstack:
The patch that enables the service in Pike+ gate is:
On Wed, Feb 15, 2017 at 5:21 AM, Andrea Frittoli
<andrea.frittoli at gmail.com> wrote:
> Some (new?) data on the oom kill issue in the gate.
> I filed a new bug / E-R query yet for the issue  since it looks to me
> like the issue is not specific to mysqld - oom-kill will just pick the best
> candidate, which in most cases happens to be mysqld. The next most likely
> candidate to show errors in the logs is keystone, since token requests are
> rather frequent, more than any other API call probably.
> According to logstash  all failures identified by  happen on RAX nodes
> , which I hadn't realised before.
> Comparing dstat data between the failed run and a successful on an OVH node
> , the main difference I can spot is free memory.
> For the same test job, the free memory tends to be much lower, quite close
> to zero for the majority of the time on the RAX node. My guess is that an
> unlucky scheduling of tests may cause a slightly higher peak in memory usage
> and trigger the oom-kill.
> I find it hard to relate lower free memory to a specific cloud provider /
> underlying virtualisation technology, but maybe someone has an idea about
> how that could be?
>  https://bugs.launchpad.net/tempest/+bug/1664953
>  https://review.openstack.org/434238
> On Mon, Feb 6, 2017 at 10:13 AM Miguel Angel Ajo Pelayo
> <majopela at redhat.com> wrote:
>> Jeremy Stanley wrote:
>> > It's an option of last resort, I think. The next consistent flavor
>> > up in most of the providers donating resources is double the one
>> > we're using (which is a fairly typical pattern in public clouds). As
>> > aggregate memory constraints are our primary quota limit, this would
>> > effectively halve our current job capacity.
>> Properly coordinated with all the cloud the providers, they could create
>> flavours which are private but available to our tenants, where a 25-50% more
>> RAM would be just enough.
>> I agree that should probably be a last resort tool, and we should keep
>> looking for proper ways to find where we consume unnecessary RAM and make
>> sure that's properly freed up.
>> It could be interesting to coordinate such flavour creation in the mean
>> time, even if we don't use it now, we could eventually test it or put it to
>> work if we find trapped anytime later.
>> On Sun, Feb 5, 2017 at 8:37 PM, Matt Riedemann <mriedemos at gmail.com>
>>> On 2/5/2017 1:19 PM, Clint Byrum wrote:
>>>> Also I wonder if there's ever been any serious consideration given to
>>>> switching to protobuf? Feels like one could make oslo.versionedobjects
>>>> a wrapper around protobuf relatively easily, but perhaps that's already
>>>> been explored in a forum that I wasn't paying attention to.
>>> I've never heard of anyone attempting that.
>>> Matt Riedemann
>>> OpenStack Development Mailing List (not for usage questions)
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev