[ci] Kernel panics in the guest vm
smooney at redhat.com
Mon Dec 7 13:01:14 UTC 2020
On Sun, 2020-12-06 at 10:42 +0100, Slawek Kaplonski wrote:
> Since some time I noticed that quite often some scenario jobs are failing due to
> issue with SSH to the guest vm and when I was checking the reason of this SSH
> failure, it seems that it's due to Kernel panic in the guest vm, like e.g. :
> [ 0.000000] Console: colour VGA+ 80x25
> [ 0.000000] printk: console [tty1] enabled
> [ 0.000000] printk: console [ttyS0] enabled
> [ 0.000000] ACPI: Core revision 20190703
> [ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
> [ 0.000000] APIC: Switch to symmetric I/O mode setup
> [ 0.000000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.000000] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
> [ 0.000000] ...trying to set up timer (IRQ0) through the 8259A ...
> [ 0.000000] ..... (found apic 0 pin 2) ...
> [ 0.000000] ....... failed.
> [ 0.000000] ...trying to set up timer as Virtual Wire IRQ...
> [ 0.000000] ..... failed.
> [ 0.000000] ...trying to set up timer as ExtINT IRQ...
> [ 0.000000] ..... failed :(.
> [ 0.000000] Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option.
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-26-generic #28~18.04.1-Ubuntu
> [ 0.000000] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1 04/01/2014
> [ 0.000000] Call Trace:
> [ 0.000000] dump_stack+0x6d/0x95
> [ 0.000000] panic+0xfe/0x2d4
> [ 0.000000] check_timer+0x5e8/0x685
> [ 0.000000] ? radix_tree_lookup+0xd/0x10
> [ 0.000000] setup_IO_APIC+0x182/0x1ca
> [ 0.000000] apic_intr_mode_init+0x1f5/0x1f8
> [ 0.000000] x86_late_time_init+0x1b/0x22
> [ 0.000000] start_kernel+0x4cb/0x58b
> [ 0.000000] x86_64_start_reservations+0x24/0x26
> [ 0.000000] x86_64_start_kernel+0x74/0x77
> [ 0.000000] secondary_startup_64+0xa4/0xb0
> [ 0.000000] ---[ end Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. ]---
> Logstash  is telling me that it is problem not only in neutron related jobs.
> Maybe someone of You was already trying to investigate such issue and maybe You
> have some ideas what we can do with it?
> In this specific example above , it was Cirros 0.5.1 image used. But I didn't
> check if that is the case in all other cases TBH.
this has been happening for months its not new.
this might be an issue with the ci providers qemu verion or the kernel in the cirros image
we could provide a way to disabel the io apic via nova likely via an image property which we would set on the cirros image via devstack.
byond that i dont know what we can do other then move to something like alpine whihc is maintained instead of cirros
rhel https://bugzilla.redhat.com/show_bug.cgi?id=221658 and ubuntu https://bugs.launchpad.net/ubuntu/+source/linux/+bug/52553
have both hit this issue in the past in the ~2.6 kernel timeframe
cirros uses a ubuntu 18.04 kernel so i think its more likely to be https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1856387
that is in theory fix in the 4.15 kernel that 18.04 default too but cirros is using a 5.3 which i think is form the cloud arche that might not be
>  https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c50/764921/1/gate/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/c501b2c/testr_results.html
>  http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Kernel%20panic%20-%20not%20syncing%3A%20IO-APIC%20%2B%20timer%20doesn't%20work!%20%20Boot%20with%20apic%3Ddebug%20and%20send%20a%20report.%20%20Then%20try%20booting%20with%20the%20'noapic'%20option.%5C%22
More information about the openstack-discuss