OpenStack instance fail to boot after unexpected compute node shutdown
Hi, We are running Openstack Rocky version. Our instances are getting volume from CEPH (mimic version). Suddenly one of our compute node got shutdown due to RAM issue. So all instances running on that compute node (linux/windows) got effected. After starting the compute node all services up & running. But instances on that compute node fail to boot. Instances are getting volume from CEPH storage & CEPH is in Healthy state. Please see below log. Requesting if anyone can help me to solve. ```````````````````````````````````````````````````````````````````````````` ```````````````````````````````````````````````````````````````````````````` ```````````````````````````````````````````````````````` ```````````````````````````````````````````````````````````````````````````` ```````````````````````````````````````````````````````````````````````````` ```````````````````````````````````````````````````````` Welcome to [0;34mCentOS Linux 7 (Core) dracut-033-554.el7 (Initramfs)[0m! [ 2.543762] systemd[1]: Set hostname to <localhost.localdomain>. [ 2.601779] usb 1-1: new full-speed USB device number 2 using uhci_hcd [ 2.609813] random: systemd: uninitialized urandom read (16 bytes read) [ 2.616574] random: systemd: uninitialized urandom read (16 bytes read) [ 2.622966] random: systemd: uninitialized urandom read (16 bytes read) [ 2.628863] random: systemd: uninitialized urandom read (16 bytes read) [ 2.635771] random: systemd: uninitialized urandom read (16 bytes read) [ 2.641509] random: systemd: uninitialized urandom read (16 bytes read) [ 2.647362] random: systemd: uninitialized urandom read (16 bytes read) [ 2.660222] systemd[1]: Reached target Local File Systems. [[32m OK [0m] Reached target Local File Systems. [ 2.668273] systemd[1]: Created slice Root Slice. [[32m OK [0m] Created slice Root Slice. [ 2.674835] systemd[1]: Created slice System Slice. [[32m OK [0m] Created slice System Slice. [ 2.681170] systemd[1]: Listening on udev Kernel Socket. [[32m OK [0m] Listening on udev Kernel Socket. [ 2.687811] systemd[1]: Listening on Journal Socket. [[32m OK [0m] Listening on Journal Socket. [ 2.694604] systemd[1]: Starting Create list of required static device nodes for the current kernel... Starting Create list of required st... nodes for the current kernel... [ 2.705069] systemd[1]: Starting Journal Service... Starting Journal Service... [ 2.710298] systemd[1]: Listening on udev Control Socket. [[32m OK [0m] Listening on udev Control Socket. [ 2.715399] systemd[1]: Reached target Sockets. [[32m OK [0m] Reached target Sockets. [ 2.720550] systemd[1]: Starting dracut cmdline hook... Starting dracut cmdline hook... [ 2.725277] systemd[1]: Reached target Slices. [[32m OK [0m] Reached target Slices. [ 2.731319] systemd[1]: Starting Apply Kernel Variables... Starting Apply Kernel Variables... [ 2.738718] systemd[1]: Starting Setup Virtual Console... Starting Setup Virtual Console... [ 2.744446] systemd[1]: Reached target Timers. [[32m OK [0m] Reached target Timers. [ 2.749668] systemd[1]: Reached target Swap. [ 2.781213] usb 1-1: New USB device found, idVendor=0627, idProduct=0001 [ 2.781217] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=5 [ 2.781220] usb 1-1: Product: QEMU USB Tablet [ 2.781222] usb 1-1: Manufacturer: QEMU [ 2.781225] usb 1-1: SerialNumber: 42 [ 2.789965] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/input/input4 [ 2.790189] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0 [[32m OK [0m] Reached target Swap. [ 2.887336] systemd[1]: Started Journal Service. [[32m OK [0m] Started Journal Service. [[32m OK [0m] Started Create list of required sta...ce nodes for the current kernel. [[32m OK [0m] Started Apply Kernel Variables. [ 2.903653] random: fast init done Starting Create Static Device Nodes in /dev... [[32m OK [0m] Started Create Static Device Nodes in /dev. [[32m OK [0m] Started dracut cmdline hook. Starting dracut pre-udev hook... [[32m OK [0m] Started Setup Virtual Console. [ 2.962744] device-mapper: uevent: version 1.0.3 [ 2.966701] device-mapper: ioctl: 4.37.1-ioctl (2018-04-03) initialised: dm-devel@redhat.com <mailto:dm-devel@redhat.com> [[32m OK [0m] Started dracut pre-udev hook. Starting udev Kernel Device Manager... [[32m OK [0m] Started udev Kernel Device Manager. Starting udev Coldplug all Devices... [[32m OK [0m] Started udev Coldplug all Devices. [[32m OK [0m] Reached target System Initialization. Starting Show Plymouth Boot Screen... Starting dracut initqueue hook... [[32m OK [0m] Started Show Plymouth Boot Screen. [[32m OK [0m] Reached target Paths. [[32m OK [0m] Started Forward Password Requests to Plymouth Directory Watch. [[32m OK [0m] Reached target Basic System. Mounting Configuration File System... [[32m OK [0m] Mounted Configuration File System. [ 3.278679] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10 [ 3.284823] FDC 0 is a S82078B [ 3.346628] scsi host0: ata_piix [ 3.358340] scsi host1: ata_piix [ 3.363061] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc0c0 irq 14 [ 3.369572] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc0c8 irq 15 [ 3.413844] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 %G[ 3.489243] virtio_blk virtio1: [vda] 629145600 512-byte logical blocks (322 GB/300 GiB) [ 3.489830] vda: vda1 vda2 vda3 [ 3.527229] ata1.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 3.529577] ata1.00: configured for MWDMA2 [ 3.542907] scsi 0:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5 [ 3.622851] sr 0:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray [ 3.626938] cdrom: Uniform CD-ROM driver Revision: 3.20 [[32m OK [0m] Found device /dev/mapper/brilliant--centos--vg-root. Starting File System Check on /dev/...er/brilliant--centos--vg-root... [[32m OK [0m] Started File System Check on /dev/mapper/brilliant--centos--vg-root. [[32m OK [0m] Started dracut initqueue hook. Mounting /sysroot... [[32m OK [0m] Reached target Remote File Systems (Pre). [[32m OK [0m] Reached target Remote File Systems. [ 4.029336] SGI XFS with ACLs, security attributes, no debug enabled [ 4.035798] XFS (dm-0): Mounting V5 Filesystem [ 5.917747] blk_update_request: I/O error, dev vda, sector 13098937 [ 5.922604] blk_update_request: I/O error, dev vda, sector 13099961 [ 5.927008] blk_update_request: I/O error, dev vda, sector 13100985 [ 5.931339] blk_update_request: I/O error, dev vda, sector 13102009 [ 5.935231] XFS (dm-0): xfs_do_force_shutdown(0x1) called from line 1266 of file fs/xfs/xfs_buf.c. Return address = 0xffffffffc02738cc [ 5.942478] XFS (dm-0): I/O Error Detected. Shutting down filesystem [ 5.946134] XFS (dm-0): Please umount the filesystem and rectify the problem(s) [ 5.952283] XFS (dm-0): metadata I/O error: block 0x782fb9 ("xlog_bwrite") error 5 numblks 8192 [ 5.959663] XFS (dm-0): failed to locate log tail [ 5.963221] XFS (dm-0): log mount/recovery failed: error -5 [ 5.967272] XFS (dm-0): log mount failed [ 5.979786] blk_update_request: I/O error, dev vda, sector 0 [ 5.992687] blk_update_request: I/O error, dev vda, sector 0 [[1;31mFAILED[0m] Failed to mount /sysroot. See 'systemctl status sysroot.mount' for details. [[1;33mDEPEND[0m] Dependency failed for Initrd Root File System. [[1;33mDEPEND[0m] Dependency failed for Reload Configuration from the Real Root. [[32m OK [0m] Stopped dracut pre-udev hook. [[32m OK [0m] Stopped dracut cmdline hook. [[32m OK [0m] Stopped target Basic System. [[32m OK [0m] Stopped target System Initialization. [[32m OK [0m] Started Emergency Shell. [[32m OK [0m] Reached target Emergency Mode. [[32m OK [0m] Stopped dracut initqueue hook. [[32m OK [0m] Reached target Initrd File Systems. Generating "/run/initramfs/rdsosreport.txt" Entering emergency mode. Exit the shell to continue. Type "journalctl" to view system logs. You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot after mounting them and attach it to a bug report. :/# [ 12.851915] random: crng init done ```````````````````````````````````````````````````````````````````````````` ```````````````````````````````````````````````````````````````````````````` ```````````````````````````````````````````````````````` ```````````````````````````````````````````````````````````````````````````` ```````````````````````````````````````````````````````````````````````````` ```````````````````````````````````````````````````````` Thanks & B'Rgds, Rony
On 4/25/19 6:43 PM, Md. Farhad Hasan Khan wrote:
Hi,
We are running Openstack Rocky version. Our instances are getting volume from CEPH (mimic version).
Suddenly one of our compute node got shutdown due to RAM issue. So all instances running on that compute node (linux/windows) got effected. After starting the compute node all services up & running. But instances on that compute node fail to boot. Instances are getting volume from CEPH storage& CEPH is in Healthy state. Please see below log.
Requesting if anyone can help me to solve.
Hi, Make sure you have set in nova.conf: resume_guests_state_on_host_boot=True on each of your compute nodes. Also, it's wise to have instance_usage_audit=True I hope this helps, Thomas Goirand (zigo)
-----Original Message----- From: Thomas Goirand [mailto:zigo@debian.org] Sent: Friday, April 26, 2019 3:59 AM To: rony.khan@brilliant.com.bd; openstack-discuss@lists.openstack.org Subject: Re: OpenStack instance fail to boot after unexpected compute node shutdown On 4/25/19 6:43 PM, Md. Farhad Hasan Khan wrote:
Hi,
We are running Openstack Rocky version. Our instances are getting volume from CEPH (mimic version).
Suddenly one of our compute node got shutdown due to RAM issue. So all instances running on that compute node (linux/windows) got effected. After starting the compute node all services up & running. But instances on that compute node fail to boot. Instances are getting volume from CEPH storage& CEPH is in Healthy state. Please see below log.
Requesting if anyone can help me to solve.
Hi, Make sure you have set in nova.conf: resume_guests_state_on_host_boot=True on each of your compute nodes. Also, it's wise to have instance_usage_audit=True I hope this helps, Thomas Goirand (zigo) Hi, Thanks a lot for your quick response. ////////////////////// Make sure you have set in nova.conf: resume_guests_state_on_host_boot=True ##### this settings we are not using. We shall check. ////////////////////// Also, it's wise to have instance_usage_audit=True ##### already configured. Please help if there is any way to repair effected instances volume. Thanks & B'Rgds, Rony
katılımcılar (2)
-
Md. Farhad Hasan Khan
-
Thomas Goirand