[openstack-dev] [devstack] [Cinder-GlusterFS CI] centos7 gate job abrupt failures

Deepak Shetty dpkshetty at gmail.com
Wed Feb 25 11:32:34 UTC 2015


On Wed, Feb 25, 2015 at 6:11 AM, Jeremy Stanley <fungi at yuggoth.org> wrote:

> On 2015-02-25 01:02:07 +0530 (+0530), Bharat Kumar wrote:
> [...]
> > After running 971 test cases VM inaccessible for 569 ticks
> [...]
>
> Glad you're able to reproduce it. For the record that is running
> their 8GB performance flavor with a CentOS 7 PVHVM base image. The
>

So we had 2 runs in total in the rax provider VM and below are the results:

Run 1) It failed and re-created the OOM. The setup had glusterfs as a
storage
backend for Cinder.

[deepakcs at deepakcs r6-jeremy-rax-vm]$ grep oom-killer
run1-w-gluster/logs/syslog.txt
Feb 24 18:41:08 devstack-centos7-rax-dfw-979654.slave.openstack.org kernel:
mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

Run 2) We *removed glusterfs backend*, so Cinder was configured with the
default
storage backend i.e. LVM. *We re-created the OOM here too*

So that proves that glusterfs doesn't cause it, as its happening without
glusterfs too.
The VM (104.239.136.99) is now in such a bad shape that existing ssh
sessions
are no longer responding for a long long time now, tho' ping works. So need
someone to
help reboot/restart the VM so that we can collect the logs for records.
Couldn't find anyone
during apac TZ to get it reboot.

We managed to get the below grep to work after a long time from another
terminal
to prove that oom did happen for run2

bash-4.2$ sudo cat /var/log/messages| grep oom-killer
Feb 25 08:53:16 devstack-centos7-rax-dfw-979654 kernel: ntpd invoked
oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Feb 25 09:03:35 devstack-centos7-rax-dfw-979654 kernel: beam.smp invoked
oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Feb 25 09:57:28 devstack-centos7-rax-dfw-979654 kernel: mysqld invoked
oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Feb 25 10:40:38 devstack-centos7-rax-dfw-979654 kernel: mysqld invoked
oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0


steps to recreate are http://paste.openstack.org/show/181303/ as
> discussed in IRC (for the sake of others following along). I've held
> a similar worker in HPCloud (15.126.235.20) which is a 30GB flavor
>

We ran 2 runs in total in the hpcloud provider VM (and this time it was
setup correctly with 8g ram, as evident from /proc/meminfo as well as dstat
output)

Run1) It was successfull. The setup had glusterfs as a storage
backend for Cinder. Only 2 testcases failed, they were expected. No oom
happened.

[deepakcs at deepakcs r7-jeremy-hpcloud-vm]$ grep oom-killer
run1-w-gluster/logs/syslog.txt
[deepakcs at deepakcs r7-jeremy-hpcloud-vm]$

Run 2) Since run1 went fine, we enabled tempest volume backup testcases too
and ran again.
It was successfull and no oom happened.

[deepakcs at deepakcs r7-jeremy-hpcloud-vm]$ grep oom-killer
run2-w-gluster/logs/syslog.txt
[deepakcs at deepakcs r7-jeremy-hpcloud-vm]$


> artifically limited to 8GB through a kernel boot parameter.
> Hopefully following the same steps there will help either confirm
> the issue isn't specific to running in one particular service
> provider, or will yield some useful difference which could help
> highlight the cause.
>

So from the above we can conclude that the tests are running fine on hpcloud
and not on rax provider. Since the OS (centos7) inside the VM across
provider is same,
this now boils down to some issue with rax provider VM + centos7
combination.

Another data point I could gather is:
    The only other centos7 job we have is check-tempest-dsvm-centos7 and it
does not run full tempest
looking at the job's config it only runs smoke tests (also confirmed the
same with Ian W) which i believe
is a subset of tests only.

So that brings to the conclusion that probably cinder-glusterfs CI job
(check-tempest-dsvm-full-glusterfs-centos7) is the first centos7
based job running full tempest tests in upstream CI and hence is the first
to hit the issue , but on rax provider only

thanx,
deepak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150225/ed8591ac/attachment.html>


More information about the OpenStack-dev mailing list