[Openstack-operators] Guest crash and KVM unhandled rdmsr

George Mihaiescu lmihaiescu at gmail.com
Tue Oct 17 16:52:18 UTC 2017


Hi Blair,

We had a few cases of compute nodes hanging with the last log in syslog
being related to "rdmsr", and requiring hard reboots:
 kvm [29216]: vcpu0 unhandled rdmsr: 0x345

The workloads are probably similar to yours (SGE workers doing genomics)
with CPU mode host-passthrough, on top of Ubuntu 16.04 and kernel
4.4.0-96-generic.

I'm not sure the "rdmsr" logs are relevant though, because we see them on
other  compute nodes that have no issues.

Did you find anything that might indicate what the root cause is?

Cheers,
George


On Thu, Oct 12, 2017 at 5:26 PM, Blair Bethwaite <blair.bethwaite at gmail.com>
wrote:

> Hi all,
>
> Has anyone seen guest crashes/freezes associated with KVM unhandled rdmsr
> messages in dmesg on the hypervisor?
>
> We have seen these messages before but never with a strong correlation to
> guest problems. However over the past couple of weeks this is happening
> almost daily with consistent correlation for a set of hosts dedicated to a
> particular HPC workload. So far as I know the workload has not changed, but
> we have just recently moved the hypervisors to Ubuntu Xenial (though they
> were already on the Xenial kernel previously) and done minor guest
> (CentOS7) updates. CPU mode is host-passthrough. Currently trying to figure
> out if the CPU flags in the guest have changed since the host upgrade...
>
> Cheers,
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20171017/b5990422/attachment.html>


More information about the OpenStack-operators mailing list