[openstack-dev] [devstack] [Cinder-GlusterFS CI] centos7 gate job abrupt failures

Daniel P. Berrange berrange at redhat.com
Tue Feb 24 13:42:31 UTC 2015


On Fri, Feb 20, 2015 at 10:49:29AM -0800, Joe Gordon wrote:
> On Fri, Feb 20, 2015 at 7:29 AM, Deepak Shetty <dpkshetty at gmail.com> wrote:
> 
> > Hi Jeremy,
> >   Couldn't find anything strong in the logs to back the reason for OOM.
> > At the time OOM happens, mysqld and java processes have the most RAM hence
> > OOM selects mysqld (4.7G) to be killed.
> >
> > From a glusterfs backend perspective, i haven't found anything suspicious,
> > and we don't have the logs of glusterfs (which is typically in
> > /var/log/glusterfs) so can't delve inside glusterfs too much :(
> >
> > BharatK (in CC) also tried to re-create the issue in local VM setup, but
> > it hasn't yet!
> >
> > Having said that,* we do know* that we started seeing this issue after we
> > enabled the nova-assisted-snapshot tests (by changing nova' s policy.json
> > to enable non-admin to create hyp-assisted snaps). We think that enabling
> > online snaps might have added to the number of tests and memory load &
> > thats the only clue we have as of now!
> >
> >
> It looks like OOM killer hit while qemu was busy and during
> a ServerRescueTest. Maybe libvirt logs would be useful as well?
> 
> And I don't see any tempest tests calling assisted-volume-snapshots
> 
> Also this looks odd: Feb 19 18:47:16
> devstack-centos7-rax-iad-916633.slave.openstack.org libvirtd[3753]: missing
> __com.redhat_reason in disk io error event

So that specific error message is harmless - the __com.redhat_reason field
is nothing important from OpenStack's POV.

However, it is interesting that QEMU is seeing an I/O error in the first
place. This occurs when you have a grow on demand file, and the underlying
storage is full, so unable to allocate more blocks to cope with a guest
write. It can also occur if the underlying storage has a fatal I/O problem,
eg dead sector in harddisk, or the some equivalent.

IOW, I'd not expect to see any I/O errors raised from OpenStack in a normal
scenario. So this is something to consider investigating.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list