[openstack-dev] [devstack] [Cinder-GlusterFS CI] centos7 gate job abrupt failures

Jeremy Stanley fungi at yuggoth.org
Thu Feb 19 21:37:59 UTC 2015


On 2015-02-19 17:03:49 +0100 (+0100), Deepak Shetty wrote:
[...]
> For some reason we are seeing the centos7 glusterfs CI job getting
> aborted/ killed either by Java exception or the build getting
> aborted due to timeout.
[...]
> Hoping to root cause this soon and get the cinder-glusterfs CI job
> back online soon.

I manually reran the same commands this job runs on an identical
virtual machine and was able to reproduce some substantial
weirdness.

I temporarily lost remote access to the VM around 108 minutes into
running the job (~17:50 in the logs) and the out of band console
also became unresponsive to carriage returns. The machine's IP
address still responded to ICMP ping, but attempts to open new TCP
sockets to the SSH service never got a protocol version banner back.
After about 10 minutes of that I went out to lunch but left
everything untouched. To my excitement it was up and responding
again when I returned.

It appears from the logs that it runs well past the 120-minute mark
where devstack-gate tries to kill the gate hook for its configured
timeout. Somewhere around 165 minutes in (18:47) you can see the
kernel out-of-memory killer starts to kick in and kill httpd and
mysqld processes according to the syslog. Hopefully this is enough
additional detail to get you a start at finding the root cause so
that we can reenable your job. Let me know if there's anything else
you need for this.

[1] http://fungi.yuggoth.org/tmp/logs.tar
-- 
Jeremy Stanley



More information about the OpenStack-dev mailing list