[Openstack] [nova] KVM internal error. Suberror: 3

Allen Yu allenyuchishing at gmail.com
Wed Apr 18 23:46:45 UTC 2018


Hi,

I have been tracing the source of a nasty bug that happens on my Openstack
Liberty cluster recently, but in vain.

Basically when I start a new instance, Openstack would report it as
"Paused" immediately. When I login to the compute node and check the qemu
logs, I saw KVM internal error. Suberror: 3.

/var/log/libvirt/qemu/instance.log
2018-04-18 22:51:49.503+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
QEMU_AUDIO_DRV=none /usr/bin/kvm -name instance-000000fc -S -machine
pc-i440fx-trusty,accel=kvm,usb=off -cpu
SandyBridge,+invpcid,+erms,+bmi2,+smep,+avx2,+bmi1,+fsgsbase,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+movbe,+dca,+pcid,+pdcm,+xtpr,+fma,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
-m 245000 -realtime mlock=off -smp 46,sockets=46,cores=1,threads=1 -uuid
d6bec617-7977-4fd6-a6ad-cd9757735fdc -smbios type=1,manufacturer=OpenStack
Foundation,product=OpenStack
Nova,version=12.0.5,serial=d8eca30f-2e73-4dfb-9270-244084458637,uuid=d6bec617-7977-4fd6-a6ad-cd9757735fdc,family=Virtual
Machine -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-000000fc.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet
-no-shutdown -boot strict=on -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/var/lib/nova/instances/d6bec617-7977-4fd6-a6ad-cd9757735fdc/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive
file=/var/lib/nova/instances/d6bec617-7977-4fd6-a6ad-cd9757735fdc/disk.swap,if=none,id=drive-virtio-disk1,format=qcow2,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1
-netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:42:9d:6f,bus=pci.0,addr=0x3
-chardev
file,id=charserial0,path=/var/lib/nova/instances/d6bec617-7977-4fd6-a6ad-cd9757735fdc/console.log
-device isa-serial,chardev=charserial0,id=serial0 -chardev
pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1
-device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -device
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
Domain id=6 is tainted: high-privileges
char device redirected to /dev/pts/7 (label charserial1)
KVM internal error. Suberror: 3
extra data[0]: 800000ef
extra data[1]: 31
RAX=0000000000000000 RBX=ffff883a8fee5fd8 RCX=00000000ffffffff
RDX=0000000000000000
RSI=0000000000000001 RDI=ffffffff81dd9e48 RBP=ffff883a8fee5ec8
RSP=ffff883a8fee5ec8
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000
R11=0000000000000000
R12=ffffffff81cdbe60 R13=000000000000000a R14=0000000000000000
R15=0000000000000000
RIP=ffffffff8103cf6b RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 0000000000000000 ffffffff 00c00000
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0000 0000000000000000 ffffffff 00c00000
DS =0000 0000000000000000 ffffffff 00c00000
FS =0000 0000000000000000 ffffffff 00c00000
GS =0000 ffff883b20340000 ffffffff 00c00000
LDT=0000 0000000000000000 ffffffff 00c00000
TR =0040 ffff883b203512c0 00002087 00008b00 DPL=0 TSS64-busy
GDT=     ffff883b20344000 0000007f
IDT=     ffffffff81dd6000 00000fff
CR0=8005003b CR2=00000000ffffffff CR3=0000000001c05000 CR4=001406e0
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000d01
Code=66 90 fb 5d c3 0f 1f 40 00 55 48 89 e5 66 66 66 66 90 fb f4 <5d> c3 0f
1f 00 55 48 89 e5 66 66 66 66 90 f4 5d c3 0f 1f 40 00 55 48 89 e5 66 66 66
66 90

Next I enabled kernel tracing according to the instructions on
https://www.linux-kvm.org/page/Tracing. I noted a lot of page faults and IO
errors. Here shows an excerpt:
qemu-system-x86-4848  [021] 561909.361044: kvm_update_master_clock:
masterclock 0 hostclock tsc offsetmatched 0
 qemu-system-x86-4848  [021] 561909.361148: kvm_fpu:              load
 qemu-system-x86-4848  [021] 561909.361151: kvm_entry:            vcpu 0
 qemu-system-x86-4848  [021] 561909.361155: kvm_exit:             reason
EPT_VIOLATION rip 0xfff0 info 184 0
 qemu-system-x86-4848  [021] 561909.361157: kvm_page_fault:       address
fffffff0 error_code 184
 qemu-system-x86-4848  [021] 561909.361167: kvm_entry:            vcpu 0
 qemu-system-x86-4848  [021] 561909.361168: kvm_exit:             reason
EPT_VIOLATION rip 0xe05b info 184 0
 qemu-system-x86-4848  [021] 561909.361168: kvm_page_fault:       address
fe05b error_code 184
 qemu-system-x86-4848  [021] 561909.361171: kvm_entry:            vcpu 0
 qemu-system-x86-4848  [021] 561909.361172: kvm_exit:             reason
EPT_VIOLATION rip 0xe05b info 181 0
 qemu-system-x86-4848  [021] 561909.361172: kvm_page_fault:       address
f6574 error_code 181

The full tracing report is also available at
https://www.dropbox.com/s/hmee8sr0zcruqyh/trace-cmd.report.gz?dl=0

Other system info:
OS: Ubuntu 14.04.5
Kernel: 3.13.0-123-generic
QEMU: version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.40)
CPU: 2 x Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz VT-d enabled
Motherboard:  X10DRT-PT Intel C610 chipset
BIOS: American Megatrends Inc. version: 1.0c date: 04/10/2015
RAM: 16 x 16GB Samsung M393A2G40DB0-CPB

Any comments or hints would be greatly appreciated. Thank you very much!

Best regards,
Allen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20180419/4eacaf45/attachment.html>


More information about the Openstack mailing list