LVM misconfiguration after openstack stackpack server hang and reboot

Alan Davis alan.davis at apogee-research.com
Thu Sep 24 19:17:50 UTC 2020


More info : server is actually running CentOS 7.6 (one of the few that
didn't recently get updated)

System has 5 disk configured in and md RAID5 set as md126
md126 : active raid5 sdf[4] sdb[0] sde[3] sdc[1] sdd[2]
      11720536064 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5]
[UUUUU]
      bitmap: 6/22 pages [24KB], 65536KB chunk
LVM filter excludes the sd : filter = [ "r|^/dev/sd[bcdef]|" ]

boot.log has complaints about 5 dm disks
[FAILED] Failed to start LVM2 PV scan on device 253:55.
[FAILED] Failed to start LVM2 PV scan on device 253:47.
[FAILED] Failed to start LVM2 PV scan on device 253:50.
[FAILED] Failed to start LVM2 PV scan on device 253:56.
[FAILED] Failed to start LVM2 PV scan on device 253:34.

Typical message :
[FAILED] Failed to start LVM2 PV scan on device 253:47.
See 'systemctl status lvm2-pvscan at 253:47.service' for details.

output of systemctl status:
systemctl status lvm2-pvscan at 253:55.service
● lvm2-pvscan at 253:55.service - LVM2 PV scan on device 253:55
   Loaded: loaded (/usr/lib/systemd/system/lvm2-pvscan at .service; static;
vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2020-09-24 09:26:58 EDT; 5h
44min ago
     Docs: man:pvscan(8)
  Process: 17395 ExecStart=/usr/sbin/lvm pvscan --cache --activate ay %i
(code=exited, status=5)
 Main PID: 17395 (code=exited, status=5)

Sep 24 09:26:58 stack3 systemd[1]: Starting LVM2 PV scan on device 253:55...
Sep 24 09:26:58 stack3 lvm[17395]: Multiple VGs found with the same name:
skipping encrypted_vg
Sep 24 09:26:58 stack3 lvm[17395]: Use --select vg_uuid=<uuid> in place of
the VG name.
Sep 24 09:26:58 stack3 systemd[1]: lvm2-pvscan at 253:55.service: main process
exited, code=exited, status=5/NOTINSTALLED
Sep 24 09:26:58 stack3 systemd[1]: Failed to start LVM2 PV scan on device
253:55.
Sep 24 09:26:58 stack3 systemd[1]: Unit lvm2-pvscan at 253:55.service entered
failed state.
Sep 24 09:26:58 stack3 systemd[1]: lvm2-pvscan at 253:55.service failed.


On Thu, Sep 24, 2020 at 2:07 PM Alan Davis <alan.davis at apogee-research.com>
wrote:

> This morning my CentOS 7.7 RDO packstack installation of Rocky hung. On
> reboot some of the VMs won't start. This is a primary system and I need to
> find the most expedient way to recover without losing data. I'm not using
> LVM thin volumes.
>
> Any help is appreciated.
>
> Looking at nova-compute.log I see errors trying to find LUN 0 during the
> sysfs stage.
>
> Several machines won't boot because their root disk entries in LVM are
> seen as PV and booting them doesn't see them in the DM subsystem.
> Other machines boot but there attached disks throw LVM errors about
> duplicate PV and preferring the cinder-volumes VG version.
>
> LVM is showing LVs that have both "bare" entries as well as entries in
> cinder-volumes and it's complaining about duplicate PVs, not using lvmetad
> and preferring some entries because they are in the dm subsystem.
> I've verified that, so far, I haven't lost any data. The "bare" LV not
> being used as part of the DM subsystem because it's server won't boot can
> be mounted on the openstack host and all data on it is accessible.
>
> This host has rebooted cleanly multiple times in the past. This is the
> first time it's shown any problems.
>
> Am I missing an LVM filter? (unlikely since it wasn't neede before)
> How can I reset the LVM configuration and convince it that it's not seeing
> duplicate PV?
> How do I ensure that openstack sees the right UUID and volume ID?
>
> Excerpts from error log and output of lvs :
> --- nova-compute.log --- during VM start
> 2020-09-24 11:15:27.091 13953 INFO os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8
>  - default default] Trying to connect to iSCSI portal 172.10.0.40:3260
> 2020-09-24 11:15:29.721 13953 WARNING nova.compute.manager
> [req-fd32e16f-c879-402f-a32c-6be45a943c34 48af9a366301467d9fec912fd1c072c6
> f9fc7b412a8446d083da1356aa370eb4 - default d
> efault] [instance: de7d740c-786a-4aa2-aa09-d447ae7e14b6] Received
> unexpected event network-vif-unplugged-79aff403-d2e4-4266-bd88-d7bd19d501a9
> for instance with vm_state stopped a
> nd task_state powering-on.
> 2020-09-24 11:16:21.361 13953 WARNING os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d
> 1c8 - default default] LUN 0 on iSCSI portal 172.10.0.40:3260 not found
> on sysfs after logging in.
> 2020-09-24 11:16:23.482 13953 INFO os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8
>  - default default] Trying to connect to iSCSI portal 172.10.0.40:3260
> 2020-09-24 11:17:17.741 13953 WARNING os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d
> 1c8 - default default] LUN 0 on iSCSI portal 172.10.0.40:3260 not found
> on sysfs after logging in.: VolumeDeviceNotFound: Volume device not found
> at .
> 2020-09-24 11:17:21.864 13953 INFO os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8
>  - default default] Trying to connect to iSCSI portal 172.10.0.40:3260
> 2020-09-24 11:18:16.113 13953 WARNING os_brick.initiator.connectors.iscsi
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d
> 1c8 - default default] LUN 0 on iSCSI portal 172.10.0.40:3260 not found
> on sysfs after logging in.: VolumeDeviceNotFound: Volume device not found
> at .
> 2020-09-24 11:18:17.252 13953 INFO nova.compute.manager
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8 - default defa
> ult] [instance: de7d740c-786a-4aa2-aa09-d447ae7e14b6] Successfully
> reverted task state from powering-on on failure for instance.
> 2020-09-24 11:18:17.279 13953 ERROR oslo_messaging.rpc.server
> [req-8d15fb6a-6324-471e-9497-587885eef8f6 396aeda6552f44fdac5f878b90325ee1
> 54af92f2bb494355b96024076184d1c8 - defaul
> t default] Exception during message handling: VolumeDeviceNotFound: Volume
> device not found at .
> 2020-09-24 11:18:17.279 13953 ERROR oslo_messaging.rpc.server Traceback
> (most recent call last):
> 2020-09-24 11:18:17.279 13953 ERROR oslo_messaging.rpc.server   File
> "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163,
> in _process_incoming
> 2020-09-24 11:18:17.279 13953 ERROR oslo_messaging.rpc.server     res =
> self.dispatcher.dispatch(message)
>
>
> --- lvs output ---
> I've annotated 1 machine's disks to illustrate the relationship between
> the volume-*** cinder-volumes vg entries and the "bare" lv seen as directly
> accessible from the host.
> There are 3 servers that won't boot, they are the one's who's home/vg_home
> and encrypted_home/encrypted_vg entries are shown.
>
>   WARNING: Not using lvmetad because duplicate PVs were found.
>   WARNING: Use multipath or vgimportclone to resolve duplicate PVs?
>   WARNING: After duplicates are resolved, run "pvscan --cache" to enable
> lvmetad.
>   WARNING: Not using device /dev/sdu for PV
> yZy8Xk-foKT-ovjV-0EZv-VxEM-GqiP-WH7k53. == backup_lv/encrypted_vg
>   WARNING: Not using device /dev/sdv for PV
> tHA9ui-eSIO-MDmI-RM3u-3Bf4-Dznb-Ha3XfP. == varoptgitlab/encrypted_vg
>   WARNING: Not using device /dev/sdm for PV
> 5eoyCa-sMO4-b7O4-jIfh-byZE-L5pS-3lOu0D.
>   WARNING: Not using device /dev/sdp for PV
> 3BI0nV-TP0k-rgPC-PrjH-FT7z-reMe-ec1spj.
>   WARNING: Not using device /dev/sdt for PV
> ILdbcY-VFCm-fnH6-Y3jc-pdWZ-fnl8-PH3TPe. == storage_lv/encrypted_vg
>   WARNING: Not using device /dev/sdr for PV
> zowU2N-oaBh-r4cO-cxgX-YYiq-Kf3q-mqlHfK.
>   WARNING: PV yZy8Xk-foKT-ovjV-0EZv-VxEM-GqiP-WH7k53 prefers device
> /dev/cinder-volumes/volume-c8da1abf-7143-422c-9ee5-b2724a71c8ff because
> device is in dm subsystem.
>   WARNING: PV tHA9ui-eSIO-MDmI-RM3u-3Bf4-Dznb-Ha3XfP prefers device
> /dev/cinder-volumes/volume-0a12012f-8c2e-41fb-aa0c-a7ae99c62487 because
> device is in dm subsystem.
>   WARNING: PV 5eoyCa-sMO4-b7O4-jIfh-byZE-L5pS-3lOu0D prefers device
> /dev/cinder-volumes/volume-990a057c-46cc-4a81-ba02-28b72c34791d because
> device is in dm subsystem.
>   WARNING: PV 3BI0nV-TP0k-rgPC-PrjH-FT7z-reMe-ec1spj prefers device
> /dev/cinder-volumes/volume-b6a9da6e-1958-46ea-90b4-ac1aebed8c04 because
> device is in dm subsystem.
>   WARNING: PV ILdbcY-VFCm-fnH6-Y3jc-pdWZ-fnl8-PH3TPe prefers device
> /dev/cinder-volumes/volume-302dd53b-7d05-4f6d-9ada-8f2ed6e1d4c6 because
> device is in dm subsystem.
>   WARNING: PV zowU2N-oaBh-r4cO-cxgX-YYiq-Kf3q-mqlHfK prefers device
> /dev/cinder-volumes/volume-df006472-be7a-4957-972a-1db4463f5d67 because
> device is in dm subsystem.
>   LV                                             VG             Attr
> LSize    Pool Origin                                      Data%  Meta%
>  Move Log Cpy%Sync Convert
>   home                                           centos_stack3  -wi-ao----
>    4.00g
>
>   root                                           centos_stack3  -wi-ao----
>   50.00g
>
>   swap                                           centos_stack3  -wi-ao----
>    4.00g
>
>   _snapshot-05b1e46b-1ae3-4cd0-9117-3fb53a6d94b0 cinder-volumes swi-a-s---
>   20.00g      volume-1d0ff5d5-93a3-44e8-8bfa-a9290765c8c6 0.00
>
>   lv_filestore                                   cinder-volumes -wi-ao----
>    1.00t
>
> ...
>   volume-c8da1abf-7143-422c-9ee5-b2724a71c8ff    cinder-volumes -wi-ao----
>  100.00g
>
>   volume-0a12012f-8c2e-41fb-aa0c-a7ae99c62487    cinder-volumes -wi-ao----
>   60.00g
>
>   volume-990a057c-46cc-4a81-ba02-28b72c34791d    cinder-volumes -wi-ao----
>  200.00g
>
>   volume-b6a9da6e-1958-46ea-90b4-ac1aebed8c04    cinder-volumes -wi-ao----
>   30.00g
>
>   volume-302dd53b-7d05-4f6d-9ada-8f2ed6e1d4c6    cinder-volumes -wi-ao----
>   60.00g
>
>   volume-df006472-be7a-4957-972a-1db4463f5d67    cinder-volumes -wi-ao----
>  250.00g
>
> ...
>   volume-f3250e15-bb9c-43d1-989d-8a8f6635a416    cinder-volumes -wi-ao----
>   20.00g
>
>   volume-fc1d5fcb-fda1-456b-a89d-582b7f94fb04    cinder-volumes -wi-ao----
>  300.00g
>
>   volume-fc50a717-0857-4da3-93cb-a55292f7ed6d    cinder-volumes -wi-ao----
>   20.00g
>
>   volume-ff94e2d6-449b-495d-82e6-0debd694c1dd    cinder-volumes -wi-ao----
>   20.00g
>
>   data2                                          data2_vg       -wi-a-----
> <300.00g
>
>   data                                           data_vg        -wi-a-----
>    1.79t
>
>   backup_lv                                      encrypted_vg   -wi-------
> <100.00g  == ...WH7k53
>
>   storage_lv                                     encrypted_vg   -wi-------
>  <60.00g  == ...PH3TPe
>
>   varoptgitlab_lv                                encrypted_vg   -wi-------
> <200.00g
>
>   varoptgitlab_lv                                encrypted_vg   -wi-------
>  <30.00g
>
>   varoptgitlab_lv                                encrypted_vg   -wi-------
>  <60.00g  == ...Ha3XfP
>   encrypted_home                                 home_vg        -wi-a-----
>  <40.00g
>
>   encrypted_home                                 home_vg        -wi-------
>  <60.00g
>
>   pub                                            pub_vg         -wi-a-----
>  <40.00g
>
>   pub_lv                                         pub_vg         -wi-------
> <250.00g
>
>   rpms                                           repo           -wi-a-----
>  499.99g
>
>   home                                           vg_home        -wi-a-----
>  <40.00g
>
>   gtri_pub                                       vg_pub         -wi-a-----
>   20.00g
>
>   pub                                            vg_pub         -wi-a-----
>  <40.00g
> --
> Alan Davis
> Principal System Administrator
> Apogee Research LLC
>
>

-- 
Alan Davis
Principal System Administrator
Apogee Research LLC
Office : 571.384.8941 x26
Cell : 410.701.0518
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200924/b71472a8/attachment-0001.html>


More information about the openstack-discuss mailing list