Open Stack

Thu Feb 19 21:37:59 UTC 2015

On 2015-02-19 17:03:49 +0100 (+0100), Deepak Shetty wrote:
[...]
> For some reason we are seeing the centos7 glusterfs CI job getting
> aborted/ killed either by Java exception or the build getting
> aborted due to timeout.
[...]
> Hoping to root cause this soon and get the cinder-glusterfs CI job
> back online soon.

I manually reran the same commands this job runs on an identical
virtual machine and was able to reproduce some substantial
weirdness.

I temporarily lost remote access to the VM around 108 minutes into
running the job (~17:50 in the logs) and the out of band console
also became unresponsive to carriage returns. The machine's IP
address still responded to ICMP ping, but attempts to open new TCP
sockets to the SSH service never got a protocol version banner back.
After about 10 minutes of that I went out to lunch but left
everything untouched. To my excitement it was up and responding
again when I returned.

It appears from the logs that it runs well past the 120-minute mark
where devstack-gate tries to kill the gate hook for its configured
timeout. Somewhere around 165 minutes in (18:47) you can see the
kernel out-of-memory killer starts to kick in and kill httpd and
mysqld processes according to the syslog. Hopefully this is enough
additional detail to get you a start at finding the root cause so
that we can reenable your job. Let me know if there's anything else
you need for this.

[1] http://fungi.yuggoth.org/tmp/logs.tar
-- 
Jeremy Stanley

Open Stack

[openstack-dev] [devstack] [Cinder-GlusterFS CI] centos7 gate job abrupt failures

OpenStack

Community

Documentation

Branding & Legal