LVM misconfiguration after openstack stackpack server hang and reboot
Alan Davis
alan.davis at apogee-research.com
Thu Sep 24 19:17:50 UTC 2020
More info : server is actually running CentOS 7.6 (one of the few that
didn't recently get updated)
System has 5 disk configured in and md RAID5 set as md126
md126 : active raid5 sdf[4] sdb[0] sde[3] sdc[1] sdd[2]
11720536064 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
[UUUUU]
bitmap: 6/22 pages [24KB], 65536KB chunk
LVM filter excludes the sd : filter = [ "r|^/dev/sd[bcdef]|" ]
boot.log has complaints about 5 dm disks
[FAILED] Failed to start LVM2 PV scan on device 253:55.
[FAILED] Failed to start LVM2 PV scan on device 253:47.
[FAILED] Failed to start LVM2 PV scan on device 253:50.
[FAILED] Failed to start LVM2 PV scan on device 253:56.
[FAILED] Failed to start LVM2 PV scan on device 253:34.
Typical message :
[FAILED] Failed to start LVM2 PV scan on device 253:47.
See 'systemctl status lvm2-pvscan at 253:47.service' for details.
output of systemctl status:
systemctl status lvm2-pvscan at 253:55.service
● lvm2-pvscan at 253:55.service - LVM2 PV scan on device 253:55
Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan at .service; static;
vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2020-09-24 09:26:58 EDT; 5h
44min ago
Docs: man:pvscan(8)
Process: 17395 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay %i
(code=exited, status=5)
Main PID: 17395 (code=exited, status=5)
Sep 24 09:26:58 stack3 systemd[1]: Starting LVM2 PV scan on device 253:55...
Sep 24 09:26:58 stack3 lvm[17395]: Multiple VGs found with the same name:
skipping encrypted_vg
Sep 24 09:26:58 stack3 lvm[17395]: Use --select vg_uuid=<uuid> in place of
the VG name.
Sep 24 09:26:58 stack3 systemd[1]: lvm2-pvscan at 253:55.service: main process
exited, code=exited, status=5/NOTINSTALLED
Sep 24 09:26:58 stack3 systemd[1]: Failed to start LVM2 PV scan on device
253:55.
Sep 24 09:26:58 stack3 systemd[1]: Unit lvm2-pvscan at 253:55.service entered
failed state.
Sep 24 09:26:58 stack3 systemd[1]: lvm2-pvscan at 253:55.service failed.
On Thu, Sep 24, 2020 at 2:07 PM Alan Davis <alan.davis at apogee-research.com>
wrote:
> This morning my CentOS 7.7 RDO packstack installation of Rocky hung. On
> reboot some of the VMs won't start. This is a primary system and I need to
> find the most expedient way to recover without losing data. I'm not using
> LVM thin volumes.
>
> Any help is appreciated.
>
> Looking at nova-compute.log I see errors trying to find LUN 0 during the
> sysfs stage.
>
> Several machines won't boot because their root disk entries in LVM are
> seen as PV and booting them doesn't see them in the DM subsystem.
> Other machines boot but there attached disks throw LVM errors about
> duplicate PV and preferring the cinder-volumes VG version.
>
> LVM is showing LVs that have both "bare" entries as well as entries in
> cinder-volumes and it's complaining about duplicate PVs, not using lvmetad
> and preferring some entries because they are in the dm subsystem.
> I've verified that, so far, I haven't lost any data. The "bare" LV not
> being used as part of the DM subsystem because it's server won't boot can
> be mounted on the openstack host and all data on it is accessible.
>
> This host has rebooted cleanly multiple times in the past. This is the
> first time it's shown any problems.
>
> Am I missing an LVM filter? (unlikely since it wasn't neede before)
> How can I reset the LVM configuration and convince it that it's not seeing
> duplicate PV?
> How do I ensure that openstack sees the right UUID and volume ID?
>
> Excerpts from error log and output of lvs :
> --- nova-compute.log --- during VM start
> 2020-09-24 11:15:27.091 13953 INFO os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8
> - default default] Trying to connect to iSCSI portal 172.10.0.40:3260
> 2020-09-24 11:15:29.721 13953 WARNING nova.compute.manager
> [req-fd32e16f-c879-402f-a32c-6be45a943c34 48af9a366301467d9fec912fd1c072c6
> f9fc7b412a8446d083da1356aa370eb4 - default d
> efault] [instance: de7d740c-786a-4aa2-aa09-d447ae7e14b6] Received
> unexpected event network-vif-unplugged-79aff403-d2e4-4266-bd88-d7bd19d501a9
> for instance with vm_state stopped a
> nd task_state powering-on.
> 2020-09-24 11:16:21.361 13953 WARNING os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d
> 1c8 - default default] LUN 0 on iSCSI portal 172.10.0.40:3260 not found
> on sysfs after logging in.
> 2020-09-24 11:16:23.482 13953 INFO os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8
> - default default] Trying to connect to iSCSI portal 172.10.0.40:3260
> 2020-09-24 11:17:17.741 13953 WARNING os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d
> 1c8 - default default] LUN 0 on iSCSI portal 172.10.0.40:3260 not found
> on sysfs after logging in.: VolumeDeviceNotFound: Volume device not found
> at .
> 2020-09-24 11:17:21.864 13953 INFO os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8
> - default default] Trying to connect to iSCSI portal 172.10.0.40:3260
> 2020-09-24 11:18:16.113 13953 WARNING os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d
> 1c8 - default default] LUN 0 on iSCSI portal 172.10.0.40:3260 not found
> on sysfs after logging in.: VolumeDeviceNotFound: Volume device not found
> at .
> 2020-09-24 11:18:17.252 13953 INFO nova.compute.manager
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8 - default defa
> ult] [instance: de7d740c-786a-4aa2-aa09-d447ae7e14b6] Successfully
> reverted task state from powering-on on failure for instance.
> 2020-09-24 11:18:17.279 13953 ERROR oslo_messaging.rpc.server
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8 - defaul
> t default] Exception during message handling: VolumeDeviceNotFound: Volume
> device not found at .
> 2020-09-24 11:18:17.279 13953 ERROR oslo_messaging.rpc.server Traceback
> (most recent call last):
> 2020-09-24 11:18:17.279 13953 ERROR oslo_messaging.rpc.server File
> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163,
> in _process_incoming
> 2020-09-24 11:18:17.279 13953 ERROR oslo_messaging.rpc.server res =
> self.dispatcher.dispatch(message)
>
>
> --- lvs output ---
> I've annotated 1 machine's disks to illustrate the relationship between
> the volume-*** cinder-volumes vg entries and the "bare" lv seen as directly
> accessible from the host.
> There are 3 servers that won't boot, they are the one's who's home/vg_home
> and encrypted_home/encrypted_vg entries are shown.
>
> WARNING: Not using lvmetad because duplicate PVs were found.
> WARNING: Use multipath or vgimportclone to resolve duplicate PVs?
> WARNING: After duplicates are resolved, run "pvscan --cache" to enable
> lvmetad.
> WARNING: Not using device /dev/sdu for PV
> yZy8Xk-foKT-ovjV-0EZv-VxEM-GqiP-WH7k53. == backup_lv/encrypted_vg
> WARNING: Not using device /dev/sdv for PV
> tHA9ui-eSIO-MDmI-RM3u-3Bf4-Dznb-Ha3XfP. == varoptgitlab/encrypted_vg
> WARNING: Not using device /dev/sdm for PV
> 5eoyCa-sMO4-b7O4-jIfh-byZE-L5pS-3lOu0D.
> WARNING: Not using device /dev/sdp for PV
> 3BI0nV-TP0k-rgPC-PrjH-FT7z-reMe-ec1spj.
> WARNING: Not using device /dev/sdt for PV
> ILdbcY-VFCm-fnH6-Y3jc-pdWZ-fnl8-PH3TPe. == storage_lv/encrypted_vg
> WARNING: Not using device /dev/sdr for PV
> zowU2N-oaBh-r4cO-cxgX-YYiq-Kf3q-mqlHfK.
> WARNING: PV yZy8Xk-foKT-ovjV-0EZv-VxEM-GqiP-WH7k53 prefers device
> /dev/cinder-volumes/volume-c8da1abf-7143-422c-9ee5-b2724a71c8ff because
> device is in dm subsystem.
> WARNING: PV tHA9ui-eSIO-MDmI-RM3u-3Bf4-Dznb-Ha3XfP prefers device
> /dev/cinder-volumes/volume-0a12012f-8c2e-41fb-aa0c-a7ae99c62487 because
> device is in dm subsystem.
> WARNING: PV 5eoyCa-sMO4-b7O4-jIfh-byZE-L5pS-3lOu0D prefers device
> /dev/cinder-volumes/volume-990a057c-46cc-4a81-ba02-28b72c34791d because
> device is in dm subsystem.
> WARNING: PV 3BI0nV-TP0k-rgPC-PrjH-FT7z-reMe-ec1spj prefers device
> /dev/cinder-volumes/volume-b6a9da6e-1958-46ea-90b4-ac1aebed8c04 because
> device is in dm subsystem.
> WARNING: PV ILdbcY-VFCm-fnH6-Y3jc-pdWZ-fnl8-PH3TPe prefers device
> /dev/cinder-volumes/volume-302dd53b-7d05-4f6d-9ada-8f2ed6e1d4c6 because
> device is in dm subsystem.
> WARNING: PV zowU2N-oaBh-r4cO-cxgX-YYiq-Kf3q-mqlHfK prefers device
> /dev/cinder-volumes/volume-df006472-be7a-4957-972a-1db4463f5d67 because
> device is in dm subsystem.
> LV VG Attr
> LSize Pool Origin Data% Meta%
> Move Log Cpy%Sync Convert
> home centos_stack3 -wi-ao----
> 4.00g
>
> root centos_stack3 -wi-ao----
> 50.00g
>
> swap centos_stack3 -wi-ao----
> 4.00g
>
> _snapshot-05b1e46b-1ae3-4cd0-9117-3fb53a6d94b0 cinder-volumes swi-a-s---
> 20.00g volume-1d0ff5d5-93a3-44e8-8bfa-a9290765c8c6 0.00
>
> lv_filestore cinder-volumes -wi-ao----
> 1.00t
>
> ...
> volume-c8da1abf-7143-422c-9ee5-b2724a71c8ff cinder-volumes -wi-ao----
> 100.00g
>
> volume-0a12012f-8c2e-41fb-aa0c-a7ae99c62487 cinder-volumes -wi-ao----
> 60.00g
>
> volume-990a057c-46cc-4a81-ba02-28b72c34791d cinder-volumes -wi-ao----
> 200.00g
>
> volume-b6a9da6e-1958-46ea-90b4-ac1aebed8c04 cinder-volumes -wi-ao----
> 30.00g
>
> volume-302dd53b-7d05-4f6d-9ada-8f2ed6e1d4c6 cinder-volumes -wi-ao----
> 60.00g
>
> volume-df006472-be7a-4957-972a-1db4463f5d67 cinder-volumes -wi-ao----
> 250.00g
>
> ...
> volume-f3250e15-bb9c-43d1-989d-8a8f6635a416 cinder-volumes -wi-ao----
> 20.00g
>
> volume-fc1d5fcb-fda1-456b-a89d-582b7f94fb04 cinder-volumes -wi-ao----
> 300.00g
>
> volume-fc50a717-0857-4da3-93cb-a55292f7ed6d cinder-volumes -wi-ao----
> 20.00g
>
> volume-ff94e2d6-449b-495d-82e6-0debd694c1dd cinder-volumes -wi-ao----
> 20.00g
>
> data2 data2_vg -wi-a-----
> <300.00g
>
> data data_vg -wi-a-----
> 1.79t
>
> backup_lv encrypted_vg -wi-------
> <100.00g == ...WH7k53
>
> storage_lv encrypted_vg -wi-------
> <60.00g == ...PH3TPe
>
> varoptgitlab_lv encrypted_vg -wi-------
> <200.00g
>
> varoptgitlab_lv encrypted_vg -wi-------
> <30.00g
>
> varoptgitlab_lv encrypted_vg -wi-------
> <60.00g == ...Ha3XfP
> encrypted_home home_vg -wi-a-----
> <40.00g
>
> encrypted_home home_vg -wi-------
> <60.00g
>
> pub pub_vg -wi-a-----
> <40.00g
>
> pub_lv pub_vg -wi-------
> <250.00g
>
> rpms repo -wi-a-----
> 499.99g
>
> home vg_home -wi-a-----
> <40.00g
>
> gtri_pub vg_pub -wi-a-----
> 20.00g
>
> pub vg_pub -wi-a-----
> <40.00g
> --
> Alan Davis
> Principal System Administrator
> Apogee Research LLC
>
>
--
Alan Davis
Principal System Administrator
Apogee Research LLC
Office : 571.384.8941 x26
Cell : 410.701.0518
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200924/b71472a8/attachment-0001.html>
More information about the openstack-discuss
mailing list