[Ironic] No suitable device was found for deployment
Dear all, I have a OpenStack Victoria environment, and tried to use ironic manage bare metal. But I got "- root device hints were not provided and all found block devices are smaller than 4294967296B." in deploy stage. 2022-02-09 17:57:56.492 3908982 ERROR ironic.drivers.modules.agent_base [-] Agent returned error for deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.. I used "openstack server create --flavor my-baremetal-flavor --nic net-id=$net_id --image $image testing" to deploy bare metal node. I download deploy images(ipa-centos8-master.kernel and ipa-centos8-master.initramfs) in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. The baremetal node info and flavor info as following: https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ Ironic configure file as following: https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ The log of ironic-python-agent in bare metal node: https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ I see some old discussion about this, such as: https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those discussions took place a long time ago, not version V, and no solution was seen. Does anyone know how to solve this problem? I would appreciate any kind of guidance or help. Thank you, Han Guangyu
Hi Guangyu, The error indicates that Ironic was not able to find a device where it could deploy the image to. To find a device, Ironic will use 'root device' hints [1], usually set by the admin on a node. If that does not yield anything, Ironic will loop over all block devices and pick the smallest which is larger than 4GB (and order them alphabetically). If you have disks in your server which are larger than 4GB, one potential explanation is that Ironic cannot see them, e.g. since the corresponding drivers in the IPA image are missing. The logs you posted seem to confirm something along those lines. Check the content of the 'lsblk' file in the deploy logs which you can find in the tar archive in /var/log/ironic/deploy/ on the controller for your deployment attempt to see what devices Ironic has access to. Cheers, Arne [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... On 10.02.22 02:50, 韩光宇 wrote:
Dear all,
I have a OpenStack Victoria environment, and tried to use ironic manage bare metal. But I got "- root device hints were not provided and all found block devices are smaller than 4294967296B." in deploy stage.
2022-02-09 17:57:56.492 3908982 ERROR ironic.drivers.modules.agent_base [-] Agent returned error for deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B..
I used "openstack server create --flavor my-baremetal-flavor --nic net-id=$net_id --image $image testing" to deploy bare metal node. I download deploy images(ipa-centos8-master.kernel and ipa-centos8-master.initramfs) in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/.
The baremetal node info and flavor info as following: https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ Ironic configure file as following: https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ The log of ironic-python-agent in bare metal node: https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/
I see some old discussion about this, such as: https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those discussions took place a long time ago, not version V, and no solution was seen.
Does anyone know how to solve this problem? I would appreciate any kind of guidance or help.
Thank you, Han Guangyu
Hi Arne, Thank you very much for your response. Love you. You take away a lot of my confusion. You are right, I didn't set 'root device'. And Ironic also can not see disk, the content of the 'lsblk' file in the deploy los is emply. I tried to set 'root device', but because ironic can't find any disk, the deploy still filed. Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that matches the root device hints {'wwn': '0x50014EE2691D724C'}: StopIteration Sorry to bother you, I'm a newcomer of Ironic and I didn't find information about it on google. The bare metal node have three same disk(Western Digital DC HA210 2TB SATA 6GB/s). Where I can confirm whether ironic-python-agent supports this disk? And If Ironic cannot find disk since the corresponding drivers in the IPA image are missing, do you know how to resolve it? I have used the latest deploy images in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ . Do I need to find and manually add driver in the source code or ramdisk(That was difficult tome)? Love you. Cheers, Guangyu Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道:
Hi Guangyu,
The error indicates that Ironic was not able to find a device where it could deploy the image to.
To find a device, Ironic will use 'root device' hints [1], usually set by the admin on a node. If that does not yield anything, Ironic will loop over all block devices and pick the smallest which is larger than 4GB (and order them alphabetically).
If you have disks in your server which are larger than 4GB, one potential explanation is that Ironic cannot see them, e.g. since the corresponding drivers in the IPA image are missing. The logs you posted seem to confirm something along those lines.
Check the content of the 'lsblk' file in the deploy logs which you can find in the tar archive in /var/log/ironic/deploy/ on the controller for your deployment attempt to see what devices Ironic has access to.
Cheers, Arne
[1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h...
On 10.02.22 02:50, 韩光宇 wrote:
Dear all,
I have a OpenStack Victoria environment, and tried to use ironic manage bare metal. But I got "- root device hints were not provided and all found block devices are smaller than 4294967296B." in deploy stage.
2022-02-09 17:57:56.492 3908982 ERROR ironic.drivers.modules.agent_base [-] Agent returned error for deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B..
I used "openstack server create --flavor my-baremetal-flavor --nic net-id=$net_id --image $image testing" to deploy bare metal node. I download deploy images(ipa-centos8-master.kernel and ipa-centos8-master.initramfs) in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/.
The baremetal node info and flavor info as following: https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ Ironic configure file as following: https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ The log of ironic-python-agent in bare metal node: https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/
I see some old discussion about this, such as: https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those discussions took place a long time ago, not version V, and no solution was seen.
Does anyone know how to solve this problem? I would appreciate any kind of guidance or help.
Thank you, Han Guangyu
Hi Guangyu, No worries about asking questions, this is what the mailing list is for :) Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable. If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well. You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8). Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause. You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all. Hope this helps! Arne [1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html On 10.02.22 11:15, 韩光宇 wrote:
Hi Arne,
Thank you very much for your response. Love you. You take away a lot of my confusion.
You are right, I didn't set 'root device'. And Ironic also can not see disk, the content of the 'lsblk' file in the deploy los is emply. I tried to set 'root device', but because ironic can't find any disk, the deploy still filed.
Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that matches the root device hints {'wwn': '0x50014EE2691D724C'}: StopIteration
Sorry to bother you, I'm a newcomer of Ironic and I didn't find information about it on google.
The bare metal node have three same disk(Western Digital DC HA210 2TB SATA 6GB/s). Where I can confirm whether ironic-python-agent supports this disk?
And If Ironic cannot find disk since the corresponding drivers in the IPA image are missing, do you know how to resolve it? I have used the latest deploy images in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ . Do I need to find and manually add driver in the source code or ramdisk(That was difficult tome)?
Love you.
Cheers, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道:
Hi Guangyu,
The error indicates that Ironic was not able to find a device where it could deploy the image to.
To find a device, Ironic will use 'root device' hints [1], usually set by the admin on a node. If that does not yield anything, Ironic will loop over all block devices and pick the smallest which is larger than 4GB (and order them alphabetically).
If you have disks in your server which are larger than 4GB, one potential explanation is that Ironic cannot see them, e.g. since the corresponding drivers in the IPA image are missing. The logs you posted seem to confirm something along those lines.
Check the content of the 'lsblk' file in the deploy logs which you can find in the tar archive in /var/log/ironic/deploy/ on the controller for your deployment attempt to see what devices Ironic has access to.
Cheers, Arne
[1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h...
On 10.02.22 02:50, 韩光宇 wrote:
Dear all,
I have a OpenStack Victoria environment, and tried to use ironic manage bare metal. But I got "- root device hints were not provided and all found block devices are smaller than 4294967296B." in deploy stage.
2022-02-09 17:57:56.492 3908982 ERROR ironic.drivers.modules.agent_base [-] Agent returned error for deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B..
I used "openstack server create --flavor my-baremetal-flavor --nic net-id=$net_id --image $image testing" to deploy bare metal node. I download deploy images(ipa-centos8-master.kernel and ipa-centos8-master.initramfs) in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/.
The baremetal node info and flavor info as following: https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ Ironic configure file as following: https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ The log of ironic-python-agent in bare metal node: https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/
I see some old discussion about this, such as: https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those discussions took place a long time ago, not version V, and no solution was seen.
Does anyone know how to solve this problem? I would appreciate any kind of guidance or help.
Thank you, Han Guangyu
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in. More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system. -Julia On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
No worries about asking questions, this is what the mailing list is for :)
Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable.
If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well.
You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8).
Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause.
You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all.
Hope this helps! Arne
[1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
On 10.02.22 11:15, 韩光宇 wrote:
Hi Arne,
Thank you very much for your response. Love you. You take away a lot of my confusion.
You are right, I didn't set 'root device'. And Ironic also can not see disk, the content of the 'lsblk' file in the deploy los is emply. I tried to set 'root device', but because ironic can't find any disk, the deploy still filed.
Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that matches the root device hints {'wwn': '0x50014EE2691D724C'}: StopIteration
Sorry to bother you, I'm a newcomer of Ironic and I didn't find information about it on google.
The bare metal node have three same disk(Western Digital DC HA210 2TB SATA 6GB/s). Where I can confirm whether ironic-python-agent supports this disk?
And If Ironic cannot find disk since the corresponding drivers in the IPA image are missing, do you know how to resolve it? I have used the latest deploy images in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ . Do I need to find and manually add driver in the source code or ramdisk(That was difficult tome)?
Love you.
Cheers, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道:
Hi Guangyu,
The error indicates that Ironic was not able to find a device where it could deploy the image to.
To find a device, Ironic will use 'root device' hints [1], usually set by the admin on a node. If that does not yield anything, Ironic will loop over all block devices and pick the smallest which is larger than 4GB (and order them alphabetically).
If you have disks in your server which are larger than 4GB, one potential explanation is that Ironic cannot see them, e.g. since the corresponding drivers in the IPA image are missing. The logs you posted seem to confirm something along those lines.
Check the content of the 'lsblk' file in the deploy logs which you can find in the tar archive in /var/log/ironic/deploy/ on the controller for your deployment attempt to see what devices Ironic has access to.
Cheers, Arne
[1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h...
On 10.02.22 02:50, 韩光宇 wrote:
Dear all,
I have a OpenStack Victoria environment, and tried to use ironic manage bare metal. But I got "- root device hints were not provided and all found block devices are smaller than 4294967296B." in deploy stage.
2022-02-09 17:57:56.492 3908982 ERROR ironic.drivers.modules.agent_base [-] Agent returned error for deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B..
I used "openstack server create --flavor my-baremetal-flavor --nic net-id=$net_id --image $image testing" to deploy bare metal node. I download deploy images(ipa-centos8-master.kernel and ipa-centos8-master.initramfs) in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/.
The baremetal node info and flavor info as following: https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ Ironic configure file as following: https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ The log of ironic-python-agent in bare metal node: https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/
I see some old discussion about this, such as: https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those discussions took place a long time ago, not version V, and no solution was seen.
Does anyone know how to solve this problem? I would appreciate any kind of guidance or help.
Thank you, Han Guangyu
Hi Arne and Julia, You make me feel so warm. Best wishes to you. I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card. # lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ... I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] } But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ``` I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question? love you, Han Guangyu Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
No worries about asking questions, this is what the mailing list is for :)
Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable.
If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well.
You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8).
Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause.
You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all.
Hope this helps! Arne
[1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
On 10.02.22 11:15, 韩光宇 wrote:
Hi Arne,
Thank you very much for your response. Love you. You take away a lot of my confusion.
You are right, I didn't set 'root device'. And Ironic also can not see disk, the content of the 'lsblk' file in the deploy los is emply. I tried to set 'root device', but because ironic can't find any disk, the deploy still filed.
Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that matches the root device hints {'wwn': '0x50014EE2691D724C'}: StopIteration
Sorry to bother you, I'm a newcomer of Ironic and I didn't find information about it on google.
The bare metal node have three same disk(Western Digital DC HA210 2TB SATA 6GB/s). Where I can confirm whether ironic-python-agent supports this disk?
And If Ironic cannot find disk since the corresponding drivers in the IPA image are missing, do you know how to resolve it? I have used the latest deploy images in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ . Do I need to find and manually add driver in the source code or ramdisk(That was difficult tome)?
Love you.
Cheers, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道:
Hi Guangyu,
The error indicates that Ironic was not able to find a device where it could deploy the image to.
To find a device, Ironic will use 'root device' hints [1], usually set by the admin on a node. If that does not yield anything, Ironic will loop over all block devices and pick the smallest which is larger than 4GB (and order them alphabetically).
If you have disks in your server which are larger than 4GB, one potential explanation is that Ironic cannot see them, e.g. since the corresponding drivers in the IPA image are missing. The logs you posted seem to confirm something along those lines.
Check the content of the 'lsblk' file in the deploy logs which you can find in the tar archive in /var/log/ironic/deploy/ on the controller for your deployment attempt to see what devices Ironic has access to.
Cheers, Arne
[1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h...
On 10.02.22 02:50, 韩光宇 wrote:
Dear all,
I have a OpenStack Victoria environment, and tried to use ironic manage bare metal. But I got "- root device hints were not provided and all found block devices are smaller than 4294967296B." in deploy stage.
2022-02-09 17:57:56.492 3908982 ERROR ironic.drivers.modules.agent_base [-] Agent returned error for deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B..
I used "openstack server create --flavor my-baremetal-flavor --nic net-id=$net_id --image $image testing" to deploy bare metal node. I download deploy images(ipa-centos8-master.kernel and ipa-centos8-master.initramfs) in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/.
The baremetal node info and flavor info as following: https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ Ironic configure file as following: https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ The log of ironic-python-agent in bare metal node: https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/
I see some old discussion about this, such as: https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those discussions took place a long time ago, not version V, and no solution was seen.
Does anyone know how to solve this problem? I would appreciate any kind of guidance or help.
Thank you, Han Guangyu
Hi Guangyu, It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image. What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu). Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added). It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic. Cheers, Arne On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
No worries about asking questions, this is what the mailing list is for :)
Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable.
If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well.
You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8).
Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause.
You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all.
Hope this helps! Arne
[1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
On 10.02.22 11:15, 韩光宇 wrote:
Hi Arne,
Thank you very much for your response. Love you. You take away a lot of my confusion.
You are right, I didn't set 'root device'. And Ironic also can not see disk, the content of the 'lsblk' file in the deploy los is emply. I tried to set 'root device', but because ironic can't find any disk, the deploy still filed.
Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that matches the root device hints {'wwn': '0x50014EE2691D724C'}: StopIteration
Sorry to bother you, I'm a newcomer of Ironic and I didn't find information about it on google.
The bare metal node have three same disk(Western Digital DC HA210 2TB SATA 6GB/s). Where I can confirm whether ironic-python-agent supports this disk?
And If Ironic cannot find disk since the corresponding drivers in the IPA image are missing, do you know how to resolve it? I have used the latest deploy images in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ . Do I need to find and manually add driver in the source code or ramdisk(That was difficult tome)?
Love you.
Cheers, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道:
Hi Guangyu,
The error indicates that Ironic was not able to find a device where it could deploy the image to.
To find a device, Ironic will use 'root device' hints [1], usually set by the admin on a node. If that does not yield anything, Ironic will loop over all block devices and pick the smallest which is larger than 4GB (and order them alphabetically).
If you have disks in your server which are larger than 4GB, one potential explanation is that Ironic cannot see them, e.g. since the corresponding drivers in the IPA image are missing. The logs you posted seem to confirm something along those lines.
Check the content of the 'lsblk' file in the deploy logs which you can find in the tar archive in /var/log/ironic/deploy/ on the controller for your deployment attempt to see what devices Ironic has access to.
Cheers, Arne
[1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h...
On 10.02.22 02:50, 韩光宇 wrote:
Dear all,
I have a OpenStack Victoria environment, and tried to use ironic manage bare metal. But I got "- root device hints were not provided and all found block devices are smaller than 4294967296B." in deploy stage.
2022-02-09 17:57:56.492 3908982 ERROR ironic.drivers.modules.agent_base [-] Agent returned error for deploy step {'step': 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B..
I used "openstack server create --flavor my-baremetal-flavor --nic net-id=$net_id --image $image testing" to deploy bare metal node. I download deploy images(ipa-centos8-master.kernel and ipa-centos8-master.initramfs) in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/.
The baremetal node info and flavor info as following: https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ Ironic configure file as following: https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ The log of ironic-python-agent in bare metal node: https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/
I see some old discussion about this, such as: https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those discussions took place a long time ago, not version V, and no solution was seen.
Does anyone know how to solve this problem? I would appreciate any kind of guidance or help.
Thank you, Han Guangyu
Hi Arne, I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this. And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem? I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this? Sorry that I have so much confuse. love you, Guangyu Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
No worries about asking questions, this is what the mailing list is for :)
Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable.
If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well.
You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8).
Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause.
You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all.
Hope this helps! Arne
[1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
On 10.02.22 11:15, 韩光宇 wrote:
Hi Arne,
Thank you very much for your response. Love you. You take away a lot of my confusion.
You are right, I didn't set 'root device'. And Ironic also can not see disk, the content of the 'lsblk' file in the deploy los is emply. I tried to set 'root device', but because ironic can't find any disk, the deploy still filed.
Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that matches the root device hints {'wwn': '0x50014EE2691D724C'}: StopIteration
Sorry to bother you, I'm a newcomer of Ironic and I didn't find information about it on google.
The bare metal node have three same disk(Western Digital DC HA210 2TB SATA 6GB/s). Where I can confirm whether ironic-python-agent supports this disk?
And If Ironic cannot find disk since the corresponding drivers in the IPA image are missing, do you know how to resolve it? I have used the latest deploy images in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ . Do I need to find and manually add driver in the source code or ramdisk(That was difficult tome)?
Love you.
Cheers, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道:
Hi Guangyu,
The error indicates that Ironic was not able to find a device where it could deploy the image to.
To find a device, Ironic will use 'root device' hints [1], usually set by the admin on a node. If that does not yield anything, Ironic will loop over all block devices and pick the smallest which is larger than 4GB (and order them alphabetically).
If you have disks in your server which are larger than 4GB, one potential explanation is that Ironic cannot see them, e.g. since the corresponding drivers in the IPA image are missing. The logs you posted seem to confirm something along those lines.
Check the content of the 'lsblk' file in the deploy logs which you can find in the tar archive in /var/log/ironic/deploy/ on the controller for your deployment attempt to see what devices Ironic has access to.
Cheers, Arne
[1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h...
On 10.02.22 02:50, 韩光宇 wrote: > Dear all, > > I have a OpenStack Victoria environment, and tried to use ironic > manage bare metal. But I got "- root device hints were not provided > and all found block devices are smaller than 4294967296B." in deploy > stage. > > 2022-02-09 17:57:56.492 3908982 ERROR > ironic.drivers.modules.agent_base [-] Agent returned error for deploy > step {'step': 'write_image', 'priority': 80, 'argsinfo': None, > 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : > No suitable device was found for deployment - root device hints were > not provided and all found block devices are smaller than > 4294967296B.. > > I used "openstack server create --flavor my-baremetal-flavor --nic > net-id=$net_id --image $image testing" to deploy bare metal node. I > download deploy images(ipa-centos8-master.kernel and > ipa-centos8-master.initramfs) in > https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. > > The baremetal node info and flavor info as following: > https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ > Ironic configure file as following: > https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ > Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ > The log of ironic-python-agent in bare metal node: > https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ > > I see some old discussion about this, such as: > https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those > discussions took place a long time ago, not version V, and no solution > was seen. > > Does anyone know how to solve this problem? I would appreciate any > kind of guidance or help. > > Thank you, > Han Guangyu >
Hi Guangyu, I am not aware of anything in the Ironic Python Agent that would remove disks from the system in a way that they would not be visible after a reboot (apart from, as mentioned before, the clean up of a hardware RAID in a way the IPA is not able to see any devices after). How about trying to access and configure the hardware RAID with the corresponding tool from the RAM disk you booted into from the USB stick? Install the tool and see if it detects the controller. The very first step before doing anything with Ironic is to get the disks back or understand why they are not visible. Cheers, Arne On 28.02.22 09:28, 韩光宇 wrote:
Hi Arne,
I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this.
And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem?
I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this?
Sorry that I have so much confuse.
love you, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
No worries about asking questions, this is what the mailing list is for :)
Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable.
If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well.
You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8).
Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause.
You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all.
Hope this helps! Arne
[1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
On 10.02.22 11:15, 韩光宇 wrote:
Hi Arne,
Thank you very much for your response. Love you. You take away a lot of my confusion.
You are right, I didn't set 'root device'. And Ironic also can not see disk, the content of the 'lsblk' file in the deploy los is emply. I tried to set 'root device', but because ironic can't find any disk, the deploy still filed.
Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that matches the root device hints {'wwn': '0x50014EE2691D724C'}: StopIteration
Sorry to bother you, I'm a newcomer of Ironic and I didn't find information about it on google.
The bare metal node have three same disk(Western Digital DC HA210 2TB SATA 6GB/s). Where I can confirm whether ironic-python-agent supports this disk?
And If Ironic cannot find disk since the corresponding drivers in the IPA image are missing, do you know how to resolve it? I have used the latest deploy images in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ . Do I need to find and manually add driver in the source code or ramdisk(That was difficult tome)?
Love you.
Cheers, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道: > > Hi Guangyu, > > The error indicates that Ironic was not able to find > a device where it could deploy the image to. > > To find a device, Ironic will use 'root device' > hints [1], usually set by the admin on a node. If that > does not yield anything, Ironic will loop over all > block devices and pick the smallest which is larger > than 4GB (and order them alphabetically). > > If you have disks in your server which are larger than > 4GB, one potential explanation is that Ironic cannot see them, > e.g. since the corresponding drivers in the IPA image are missing. > The logs you posted seem to confirm something along those > lines. > > Check the content of the 'lsblk' file in the deploy logs which > you can find in the tar archive in /var/log/ironic/deploy/ > on the controller for your deployment attempt to see what > devices Ironic has access to. > > Cheers, > Arne > > > [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... > > On 10.02.22 02:50, 韩光宇 wrote: >> Dear all, >> >> I have a OpenStack Victoria environment, and tried to use ironic >> manage bare metal. But I got "- root device hints were not provided >> and all found block devices are smaller than 4294967296B." in deploy >> stage. >> >> 2022-02-09 17:57:56.492 3908982 ERROR >> ironic.drivers.modules.agent_base [-] Agent returned error for deploy >> step {'step': 'write_image', 'priority': 80, 'argsinfo': None, >> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : >> No suitable device was found for deployment - root device hints were >> not provided and all found block devices are smaller than >> 4294967296B.. >> >> I used "openstack server create --flavor my-baremetal-flavor --nic >> net-id=$net_id --image $image testing" to deploy bare metal node. I >> download deploy images(ipa-centos8-master.kernel and >> ipa-centos8-master.initramfs) in >> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. >> >> The baremetal node info and flavor info as following: >> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ >> Ironic configure file as following: >> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ >> Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ >> The log of ironic-python-agent in bare metal node: >> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ >> >> I see some old discussion about this, such as: >> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those >> discussions took place a long time ago, not version V, and no solution >> was seen. >> >> Does anyone know how to solve this problem? I would appreciate any >> kind of guidance or help. >> >> Thank you, >> Han Guangyu >>
Hi Arne, Yes, your idea is so good. I will try to "install tool in RAM disk" and "access and configure the hardware RAID with the corresponding tool from the RAM disk" Best wishes to you. Cheers, Guangyu Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月28日周一 17:12写道:
Hi Guangyu,
I am not aware of anything in the Ironic Python Agent that would remove disks from the system in a way that they would not be visible after a reboot (apart from, as mentioned before, the clean up of a hardware RAID in a way the IPA is not able to see any devices after).
How about trying to access and configure the hardware RAID with the corresponding tool from the RAM disk you booted into from the USB stick? Install the tool and see if it detects the controller.
The very first step before doing anything with Ironic is to get the disks back or understand why they are not visible.
Cheers, Arne
On 28.02.22 09:28, 韩光宇 wrote:
Hi Arne,
I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this.
And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem?
I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this?
Sorry that I have so much confuse.
love you, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
No worries about asking questions, this is what the mailing list is for :)
Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable.
If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well.
You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8).
Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause.
You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all.
Hope this helps! Arne
[1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
On 10.02.22 11:15, 韩光宇 wrote: > Hi Arne, > > Thank you very much for your response. Love you. You take away a lot > of my confusion. > > You are right, I didn't set 'root device'. And Ironic also can not see > disk, the content of the 'lsblk' file in the deploy los is emply. > I tried to set 'root device', but because ironic can't find any disk, > the deploy still filed. > > Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 > 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is > inaccessible, /dev/disk/by-path/* version of block device name is > unavailable Cause: [Errno 2] No such file or directory: > '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or > directory: '/dev/disk/by-path' > Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 > 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that > matches the root device hints {'wwn': '0x50014EE2691D724C'}: > StopIteration > > Sorry to bother you, I'm a newcomer of Ironic and I didn't find > information about it on google. > > The bare metal node have three same disk(Western Digital DC HA210 2TB > SATA 6GB/s). Where I can confirm whether ironic-python-agent supports > this disk? > > And If Ironic cannot find disk since the corresponding drivers in the > IPA image are missing, do you know how to resolve it? I have used the > latest deploy images in > https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ > . Do I need to find and manually add driver in the source code or > ramdisk(That was difficult tome)? > > Love you. > > Cheers, > Guangyu > > Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道: >> >> Hi Guangyu, >> >> The error indicates that Ironic was not able to find >> a device where it could deploy the image to. >> >> To find a device, Ironic will use 'root device' >> hints [1], usually set by the admin on a node. If that >> does not yield anything, Ironic will loop over all >> block devices and pick the smallest which is larger >> than 4GB (and order them alphabetically). >> >> If you have disks in your server which are larger than >> 4GB, one potential explanation is that Ironic cannot see them, >> e.g. since the corresponding drivers in the IPA image are missing. >> The logs you posted seem to confirm something along those >> lines. >> >> Check the content of the 'lsblk' file in the deploy logs which >> you can find in the tar archive in /var/log/ironic/deploy/ >> on the controller for your deployment attempt to see what >> devices Ironic has access to. >> >> Cheers, >> Arne >> >> >> [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... >> >> On 10.02.22 02:50, 韩光宇 wrote: >>> Dear all, >>> >>> I have a OpenStack Victoria environment, and tried to use ironic >>> manage bare metal. But I got "- root device hints were not provided >>> and all found block devices are smaller than 4294967296B." in deploy >>> stage. >>> >>> 2022-02-09 17:57:56.492 3908982 ERROR >>> ironic.drivers.modules.agent_base [-] Agent returned error for deploy >>> step {'step': 'write_image', 'priority': 80, 'argsinfo': None, >>> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : >>> No suitable device was found for deployment - root device hints were >>> not provided and all found block devices are smaller than >>> 4294967296B.. >>> >>> I used "openstack server create --flavor my-baremetal-flavor --nic >>> net-id=$net_id --image $image testing" to deploy bare metal node. I >>> download deploy images(ipa-centos8-master.kernel and >>> ipa-centos8-master.initramfs) in >>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. >>> >>> The baremetal node info and flavor info as following: >>> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ >>> Ironic configure file as following: >>> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ >>> Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ >>> The log of ironic-python-agent in bare metal node: >>> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ >>> >>> I see some old discussion about this, such as: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those >>> discussions took place a long time ago, not version V, and no solution >>> was seen. >>> >>> Does anyone know how to solve this problem? I would appreciate any >>> kind of guidance or help. >>> >>> Thank you, >>> Han Guangyu >>>
On Mon, Feb 28, 2022 at 1:12 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
I am not aware of anything in the Ironic Python Agent that would remove disks from the system in a way that they would not be visible after a reboot (apart from, as mentioned before, the clean up of a hardware RAID in a way the IPA is not able to see any devices after).
How about trying to access and configure the hardware RAID with the corresponding tool from the RAM disk you booted into from the USB stick? Install the tool and see if it detects the controller.
The very first step before doing anything with Ironic is to get the disks back or understand why they are not visible.
Did cleaning fail at any given point with these machines? If you have physical access, try disconnecting all of the drives, and then powering up the machine and see if you can get into the firmware configuration screen with control-h. If you can, remove all of the prior configuration or disk volumes. They will look like they are in error states most likely. If your unable to get into this screen, I would be worried about your disk controller card. If your able to clear everything out of the controller, power off, try re-inserting drives, and see what happens. See if the controller can view/interact with the drives. If it sees no drives, then my next paragraph is likely the case. The disks sound like they might be in security locked state which will likely require a desktop SATA disk controller to remedy by attaching and manually removing from a security locked state. Megaraid controllers can't recognize security locked devices (most controllers and especially ones labeled "raid controllers" can't handle it) when in pass-through mode, but I've never heard of security lock commands actually getting through to the device with those controllers in pass-through mode. If the card was in raid mode to begin with, then it likely never did anything involving secure erase as the controller should not be offering that as a feature of provided disks to the OS.
Cheers, Arne
On 28.02.22 09:28, 韩光宇 wrote:
Hi Arne,
I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this.
And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem?
I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this?
Sorry that I have so much confuse.
love you, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
No worries about asking questions, this is what the mailing list is for :)
Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable.
If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well.
You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8).
Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause.
You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all.
Hope this helps! Arne
[1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
On 10.02.22 11:15, 韩光宇 wrote: > Hi Arne, > > Thank you very much for your response. Love you. You take away a lot > of my confusion. > > You are right, I didn't set 'root device'. And Ironic also can not see > disk, the content of the 'lsblk' file in the deploy los is emply. > I tried to set 'root device', but because ironic can't find any disk, > the deploy still filed. > > Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 > 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is > inaccessible, /dev/disk/by-path/* version of block device name is > unavailable Cause: [Errno 2] No such file or directory: > '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or > directory: '/dev/disk/by-path' > Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 > 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that > matches the root device hints {'wwn': '0x50014EE2691D724C'}: > StopIteration > > Sorry to bother you, I'm a newcomer of Ironic and I didn't find > information about it on google. > > The bare metal node have three same disk(Western Digital DC HA210 2TB > SATA 6GB/s). Where I can confirm whether ironic-python-agent supports > this disk? > > And If Ironic cannot find disk since the corresponding drivers in the > IPA image are missing, do you know how to resolve it? I have used the > latest deploy images in > https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ > . Do I need to find and manually add driver in the source code or > ramdisk(That was difficult tome)? > > Love you. > > Cheers, > Guangyu > > Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道: >> >> Hi Guangyu, >> >> The error indicates that Ironic was not able to find >> a device where it could deploy the image to. >> >> To find a device, Ironic will use 'root device' >> hints [1], usually set by the admin on a node. If that >> does not yield anything, Ironic will loop over all >> block devices and pick the smallest which is larger >> than 4GB (and order them alphabetically). >> >> If you have disks in your server which are larger than >> 4GB, one potential explanation is that Ironic cannot see them, >> e.g. since the corresponding drivers in the IPA image are missing. >> The logs you posted seem to confirm something along those >> lines. >> >> Check the content of the 'lsblk' file in the deploy logs which >> you can find in the tar archive in /var/log/ironic/deploy/ >> on the controller for your deployment attempt to see what >> devices Ironic has access to. >> >> Cheers, >> Arne >> >> >> [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... >> >> On 10.02.22 02:50, 韩光宇 wrote: >>> Dear all, >>> >>> I have a OpenStack Victoria environment, and tried to use ironic >>> manage bare metal. But I got "- root device hints were not provided >>> and all found block devices are smaller than 4294967296B." in deploy >>> stage. >>> >>> 2022-02-09 17:57:56.492 3908982 ERROR >>> ironic.drivers.modules.agent_base [-] Agent returned error for deploy >>> step {'step': 'write_image', 'priority': 80, 'argsinfo': None, >>> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : >>> No suitable device was found for deployment - root device hints were >>> not provided and all found block devices are smaller than >>> 4294967296B.. >>> >>> I used "openstack server create --flavor my-baremetal-flavor --nic >>> net-id=$net_id --image $image testing" to deploy bare metal node. I >>> download deploy images(ipa-centos8-master.kernel and >>> ipa-centos8-master.initramfs) in >>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. >>> >>> The baremetal node info and flavor info as following: >>> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ >>> Ironic configure file as following: >>> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ >>> Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ >>> The log of ironic-python-agent in bare metal node: >>> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ >>> >>> I see some old discussion about this, such as: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those >>> discussions took place a long time ago, not version V, and no solution >>> was seen. >>> >>> Does anyone know how to solve this problem? I would appreciate any >>> kind of guidance or help. >>> >>> Thank you, >>> Han Guangyu >>>
Hi Julia and Arne,
Did cleaning fail at any given point with these machines? Sorry that I didn't describe it clearly. Actually, clean is alway success in ironic log, deploying is failed. I just wonder if the cleaning phase did something that caused disk identification problems.
And I have found the RAID config menu. In my machine, I need to prase "Ctrl + R" when RAID interface appear in machine boot. Thank you very much~! In RAID config menu, I found that the state of three disk is UB(unconfig bad). So, if I use the Ironic service to install an operating system for a server that has three hard disks in the 'JBOD' state, is there anything I should pay attention to or operate? If I don't do something for this, deploying stage give me an error 'No suitable device was found for deployment' and 'lslbk' is empty. After clean successed and deploy faided, the disk state is "Unconfigured bad". best wishes to you, Han Guangyu Julia Kreger <juliaashleykreger@gmail.com> 于2022年3月1日周二 22:06写道:
On Mon, Feb 28, 2022 at 1:12 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
I am not aware of anything in the Ironic Python Agent that would remove disks from the system in a way that they would not be visible after a reboot (apart from, as mentioned before, the clean up of a hardware RAID in a way the IPA is not able to see any devices after).
How about trying to access and configure the hardware RAID with the corresponding tool from the RAM disk you booted into from the USB stick? Install the tool and see if it detects the controller.
The very first step before doing anything with Ironic is to get the disks back or understand why they are not visible.
Did cleaning fail at any given point with these machines?
If you have physical access, try disconnecting all of the drives, and then powering up the machine and see if you can get into the firmware configuration screen with control-h. If you can, remove all of the prior configuration or disk volumes. They will look like they are in error states most likely. If your unable to get into this screen, I would be worried about your disk controller card. If your able to clear everything out of the controller, power off, try re-inserting drives, and see what happens. See if the controller can view/interact with the drives. If it sees no drives, then my next paragraph is likely the case.
The disks sound like they might be in security locked state which will likely require a desktop SATA disk controller to remedy by attaching and manually removing from a security locked state. Megaraid controllers can't recognize security locked devices (most controllers and especially ones labeled "raid controllers" can't handle it) when in pass-through mode, but I've never heard of security lock commands actually getting through to the device with those controllers in pass-through mode. If the card was in raid mode to begin with, then it likely never did anything involving secure erase as the controller should not be offering that as a feature of provided disks to the OS.
Cheers, Arne
On 28.02.22 09:28, 韩光宇 wrote:
Hi Arne,
I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this.
And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem?
I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this?
Sorry that I have so much confuse.
love you, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote: > > Hi Guangyu, > > No worries about asking questions, this is what the mailing > list is for :) > > Just to clarify, you do not have to set root device hints, > it also works without (with the algorithm I mentioned). > However, hints help to define the exact device and/or make > deployment more predictable/repeatable. > > If it is really a driver problem, it is an issue with the > operating system of the image you use, i.e. CentOS8. Some > drivers were removed from 7 to 8, and we have seen issues > with specific drive models as well. > > You can try to build your own IPA images as described in > [1], e.g. to add your ssh key to be able to log into the > IPA to debug further, and to eventually include drivers > (if you can identify them and they are available for CentOS8). > > Another option may be to add another (newer) disk model to > the server, just to confirm it is the disk model/driver which > is the cause. > > You could also try to boot the node into a CentOS7 (and then > a CentOS8) live image to confirm it can see the disks at all. > > Hope this helps! > Arne > > [1] > https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html > > > On 10.02.22 11:15, 韩光宇 wrote: >> Hi Arne, >> >> Thank you very much for your response. Love you. You take away a lot >> of my confusion. >> >> You are right, I didn't set 'root device'. And Ironic also can not see >> disk, the content of the 'lsblk' file in the deploy los is emply. >> I tried to set 'root device', but because ironic can't find any disk, >> the deploy still filed. >> >> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 >> 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is >> inaccessible, /dev/disk/by-path/* version of block device name is >> unavailable Cause: [Errno 2] No such file or directory: >> '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or >> directory: '/dev/disk/by-path' >> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 >> 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that >> matches the root device hints {'wwn': '0x50014EE2691D724C'}: >> StopIteration >> >> Sorry to bother you, I'm a newcomer of Ironic and I didn't find >> information about it on google. >> >> The bare metal node have three same disk(Western Digital DC HA210 2TB >> SATA 6GB/s). Where I can confirm whether ironic-python-agent supports >> this disk? >> >> And If Ironic cannot find disk since the corresponding drivers in the >> IPA image are missing, do you know how to resolve it? I have used the >> latest deploy images in >> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ >> . Do I need to find and manually add driver in the source code or >> ramdisk(That was difficult tome)? >> >> Love you. >> >> Cheers, >> Guangyu >> >> Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道: >>> >>> Hi Guangyu, >>> >>> The error indicates that Ironic was not able to find >>> a device where it could deploy the image to. >>> >>> To find a device, Ironic will use 'root device' >>> hints [1], usually set by the admin on a node. If that >>> does not yield anything, Ironic will loop over all >>> block devices and pick the smallest which is larger >>> than 4GB (and order them alphabetically). >>> >>> If you have disks in your server which are larger than >>> 4GB, one potential explanation is that Ironic cannot see them, >>> e.g. since the corresponding drivers in the IPA image are missing. >>> The logs you posted seem to confirm something along those >>> lines. >>> >>> Check the content of the 'lsblk' file in the deploy logs which >>> you can find in the tar archive in /var/log/ironic/deploy/ >>> on the controller for your deployment attempt to see what >>> devices Ironic has access to. >>> >>> Cheers, >>> Arne >>> >>> >>> [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... >>> >>> On 10.02.22 02:50, 韩光宇 wrote: >>>> Dear all, >>>> >>>> I have a OpenStack Victoria environment, and tried to use ironic >>>> manage bare metal. But I got "- root device hints were not provided >>>> and all found block devices are smaller than 4294967296B." in deploy >>>> stage. >>>> >>>> 2022-02-09 17:57:56.492 3908982 ERROR >>>> ironic.drivers.modules.agent_base [-] Agent returned error for deploy >>>> step {'step': 'write_image', 'priority': 80, 'argsinfo': None, >>>> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : >>>> No suitable device was found for deployment - root device hints were >>>> not provided and all found block devices are smaller than >>>> 4294967296B.. >>>> >>>> I used "openstack server create --flavor my-baremetal-flavor --nic >>>> net-id=$net_id --image $image testing" to deploy bare metal node. I >>>> download deploy images(ipa-centos8-master.kernel and >>>> ipa-centos8-master.initramfs) in >>>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. >>>> >>>> The baremetal node info and flavor info as following: >>>> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ >>>> Ironic configure file as following: >>>> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ >>>> Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ >>>> The log of ironic-python-agent in bare metal node: >>>> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ >>>> >>>> I see some old discussion about this, such as: >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those >>>> discussions took place a long time ago, not version V, and no solution >>>> was seen. >>>> >>>> Does anyone know how to solve this problem? I would appreciate any >>>> kind of guidance or help. >>>> >>>> Thank you, >>>> Han Guangyu >>>> >
Hi Guangyu, I would think cleaning succeeds even if there are no disks: the loop to clean the disks is simply empty, so nothing to do, success! :) Deployment then fails since it needs a disk to deploy on. For my understanding: You reconfigured the disks into JBOD state and then retried to deploy (which failed and the disks fell back into UB state)? JBOD mode should work, but is usually not the main mode h/w RAID controllers work in. One thing to try is to actually configure a RAID-0 or RAID-1 device from your three disks and retry to deploy. I am not totally sure if Ironic would try to remove such a h/w RAID config during cleaning, but Julia will be able to tell. Cheers, Arne On 04.03.22 08:54, 韩光宇 wrote:
Hi Julia and Arne,
Did cleaning fail at any given point with these machines? Sorry that I didn't describe it clearly. Actually, clean is alway success in ironic log, deploying is failed. I just wonder if the cleaning phase did something that caused disk identification problems.
And I have found the RAID config menu. In my machine, I need to prase "Ctrl + R" when RAID interface appear in machine boot. Thank you very much~! In RAID config menu, I found that the state of three disk is UB(unconfig bad).
So, if I use the Ironic service to install an operating system for a server that has three hard disks in the 'JBOD' state, is there anything I should pay attention to or operate? If I don't do something for this, deploying stage give me an error 'No suitable device was found for deployment' and 'lslbk' is empty. After clean successed and deploy faided, the disk state is "Unconfigured bad".
best wishes to you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年3月1日周二 22:06写道:
On Mon, Feb 28, 2022 at 1:12 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
I am not aware of anything in the Ironic Python Agent that would remove disks from the system in a way that they would not be visible after a reboot (apart from, as mentioned before, the clean up of a hardware RAID in a way the IPA is not able to see any devices after).
How about trying to access and configure the hardware RAID with the corresponding tool from the RAM disk you booted into from the USB stick? Install the tool and see if it detects the controller.
The very first step before doing anything with Ironic is to get the disks back or understand why they are not visible.
Did cleaning fail at any given point with these machines?
If you have physical access, try disconnecting all of the drives, and then powering up the machine and see if you can get into the firmware configuration screen with control-h. If you can, remove all of the prior configuration or disk volumes. They will look like they are in error states most likely. If your unable to get into this screen, I would be worried about your disk controller card. If your able to clear everything out of the controller, power off, try re-inserting drives, and see what happens. See if the controller can view/interact with the drives. If it sees no drives, then my next paragraph is likely the case.
The disks sound like they might be in security locked state which will likely require a desktop SATA disk controller to remedy by attaching and manually removing from a security locked state. Megaraid controllers can't recognize security locked devices (most controllers and especially ones labeled "raid controllers" can't handle it) when in pass-through mode, but I've never heard of security lock commands actually getting through to the device with those controllers in pass-through mode. If the card was in raid mode to begin with, then it likely never did anything involving secure erase as the controller should not be offering that as a feature of provided disks to the OS.
Cheers, Arne
On 28.02.22 09:28, 韩光宇 wrote:
Hi Arne,
I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this.
And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem?
I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this?
Sorry that I have so much confuse.
love you, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
> > If the disk controllers *are* enumerated in the kernel log, which is > something to also look for, then the disks themselves may be in some > weird state like security locked. Generally this shows up as the > operating system kind of sees the disk and the SATA port connected but > can't really access it. This is also an exceptionally rare state to > find one's self in. > > More common, especially in enterprise grade hardware: If the disk > controller is actually a raid controller, and there are no raid > volumes configured, then the operating system likely cannot see the > underlying disks and turn that into a usable block device. I've seen a > couple drivers over the years which expose hints of disks in the > kernel log and without raid configuration in the cards, the drivers > can't present usable block devices to the operating system system. > > -Julia > > On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote: >> >> Hi Guangyu, >> >> No worries about asking questions, this is what the mailing >> list is for :) >> >> Just to clarify, you do not have to set root device hints, >> it also works without (with the algorithm I mentioned). >> However, hints help to define the exact device and/or make >> deployment more predictable/repeatable. >> >> If it is really a driver problem, it is an issue with the >> operating system of the image you use, i.e. CentOS8. Some >> drivers were removed from 7 to 8, and we have seen issues >> with specific drive models as well. >> >> You can try to build your own IPA images as described in >> [1], e.g. to add your ssh key to be able to log into the >> IPA to debug further, and to eventually include drivers >> (if you can identify them and they are available for CentOS8). >> >> Another option may be to add another (newer) disk model to >> the server, just to confirm it is the disk model/driver which >> is the cause. >> >> You could also try to boot the node into a CentOS7 (and then >> a CentOS8) live image to confirm it can see the disks at all. >> >> Hope this helps! >> Arne >> >> [1] >> https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html >> >> >> On 10.02.22 11:15, 韩光宇 wrote: >>> Hi Arne, >>> >>> Thank you very much for your response. Love you. You take away a lot >>> of my confusion. >>> >>> You are right, I didn't set 'root device'. And Ironic also can not see >>> disk, the content of the 'lsblk' file in the deploy los is emply. >>> I tried to set 'root device', but because ironic can't find any disk, >>> the deploy still filed. >>> >>> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 >>> 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is >>> inaccessible, /dev/disk/by-path/* version of block device name is >>> unavailable Cause: [Errno 2] No such file or directory: >>> '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or >>> directory: '/dev/disk/by-path' >>> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 >>> 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that >>> matches the root device hints {'wwn': '0x50014EE2691D724C'}: >>> StopIteration >>> >>> Sorry to bother you, I'm a newcomer of Ironic and I didn't find >>> information about it on google. >>> >>> The bare metal node have three same disk(Western Digital DC HA210 2TB >>> SATA 6GB/s). Where I can confirm whether ironic-python-agent supports >>> this disk? >>> >>> And If Ironic cannot find disk since the corresponding drivers in the >>> IPA image are missing, do you know how to resolve it? I have used the >>> latest deploy images in >>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ >>> . Do I need to find and manually add driver in the source code or >>> ramdisk(That was difficult tome)? >>> >>> Love you. >>> >>> Cheers, >>> Guangyu >>> >>> Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道: >>>> >>>> Hi Guangyu, >>>> >>>> The error indicates that Ironic was not able to find >>>> a device where it could deploy the image to. >>>> >>>> To find a device, Ironic will use 'root device' >>>> hints [1], usually set by the admin on a node. If that >>>> does not yield anything, Ironic will loop over all >>>> block devices and pick the smallest which is larger >>>> than 4GB (and order them alphabetically). >>>> >>>> If you have disks in your server which are larger than >>>> 4GB, one potential explanation is that Ironic cannot see them, >>>> e.g. since the corresponding drivers in the IPA image are missing. >>>> The logs you posted seem to confirm something along those >>>> lines. >>>> >>>> Check the content of the 'lsblk' file in the deploy logs which >>>> you can find in the tar archive in /var/log/ironic/deploy/ >>>> on the controller for your deployment attempt to see what >>>> devices Ironic has access to. >>>> >>>> Cheers, >>>> Arne >>>> >>>> >>>> [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... >>>> >>>> On 10.02.22 02:50, 韩光宇 wrote: >>>>> Dear all, >>>>> >>>>> I have a OpenStack Victoria environment, and tried to use ironic >>>>> manage bare metal. But I got "- root device hints were not provided >>>>> and all found block devices are smaller than 4294967296B." in deploy >>>>> stage. >>>>> >>>>> 2022-02-09 17:57:56.492 3908982 ERROR >>>>> ironic.drivers.modules.agent_base [-] Agent returned error for deploy >>>>> step {'step': 'write_image', 'priority': 80, 'argsinfo': None, >>>>> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : >>>>> No suitable device was found for deployment - root device hints were >>>>> not provided and all found block devices are smaller than >>>>> 4294967296B.. >>>>> >>>>> I used "openstack server create --flavor my-baremetal-flavor --nic >>>>> net-id=$net_id --image $image testing" to deploy bare metal node. I >>>>> download deploy images(ipa-centos8-master.kernel and >>>>> ipa-centos8-master.initramfs) in >>>>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. >>>>> >>>>> The baremetal node info and flavor info as following: >>>>> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ >>>>> Ironic configure file as following: >>>>> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ >>>>> Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ >>>>> The log of ironic-python-agent in bare metal node: >>>>> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ >>>>> >>>>> I see some old discussion about this, such as: >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those >>>>> discussions took place a long time ago, not version V, and no solution >>>>> was seen. >>>>> >>>>> Does anyone know how to solve this problem? I would appreciate any >>>>> kind of guidance or help. >>>>> >>>>> Thank you, >>>>> Han Guangyu >>>>> >>
Hi Julia, Sorry that my last email didn't reploy some question in your email. When I get into RAID config menu, I said disk state is "unconfig bad". And the more info is that Virtual Drive Management displayed "No Configuration Present!". But I cound not modify disk state in RAID confug menu. Even if I moved the disk to other server, and used the software in operating system to modify it, it still coudn't to be modify. such as: ```shell # MegaCli -PDList -a0 Adapter #0 Enclosure Device ID: 252 Slot Number: 1 Enclosure position: 0 Device Id: 9 WWN: 50014EE2BE72FEBF Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 0 KB [0x0 Sectors] Non Coerced Size: 0 KB [0x0 Sectors] Coerced Size: 0 KB [0x0 Sectors] Firmware state: Unconfigured(bad) Device Firmware Level: WA09 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x4433221101000000 Connected Port Number: 0(path0) Inquiry Data: ATA HGST HUS722T2TALWA09WCC6N4HZV9SX FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: Unknown Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :0C (32.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 252 Slot Number: 2 Enclosure position: 0 Device Id: 10 WWN: 50014EE2691D724C Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 0 KB [0x0 Sectors] Non Coerced Size: 0 KB [0x0 Sectors] Coerced Size: 0 KB [0x0 Sectors] Firmware state: Unconfigured(bad) Device Firmware Level: WA09 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x4433221102000000 Connected Port Number: 1(path0) Inquiry Data: ATA HGST HUS722T2TALWA09WCC6N4HZVTV5 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: Unknown Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :0C (32.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 252 Slot Number: 3 Enclosure position: 0 Device Id: 11 WWN: 50014EE2BE733A5E Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Firmware state: JBOD Device Firmware Level: WA09 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x4433221103000000 Connected Port Number: 2(path0) Inquiry Data: WCC6N0KX0HJD HGST HUS722T2TALA604 RAGNWA09 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :34C (93.20 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Enabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No Exit Code: 0x00 ``` ```shell # MegaCli -PDList -a0 | grep state Firmware state: Unconfigured(bad) Firmware state: Unconfigured(bad) Firmware state: JBOD # MegaCli -PDMakeGood -PhysDrv[252:2] -a0 Adapter: 0: Failed to change PD state at EnclId-252 SlotId-2. Exit Code: 0x01 ``` And the "security locked state" is a idea that I continue. I will try to determine if it is in this state and find a way to disarm it. Thank you very much Julia, cheers, Han Guangyu Julia Kreger <juliaashleykreger@gmail.com> 于2022年3月1日周二 22:06写道:
On Mon, Feb 28, 2022 at 1:12 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
I am not aware of anything in the Ironic Python Agent that would remove disks from the system in a way that they would not be visible after a reboot (apart from, as mentioned before, the clean up of a hardware RAID in a way the IPA is not able to see any devices after).
How about trying to access and configure the hardware RAID with the corresponding tool from the RAM disk you booted into from the USB stick? Install the tool and see if it detects the controller.
The very first step before doing anything with Ironic is to get the disks back or understand why they are not visible.
Did cleaning fail at any given point with these machines?
If you have physical access, try disconnecting all of the drives, and then powering up the machine and see if you can get into the firmware configuration screen with control-h. If you can, remove all of the prior configuration or disk volumes. They will look like they are in error states most likely. If your unable to get into this screen, I would be worried about your disk controller card. If your able to clear everything out of the controller, power off, try re-inserting drives, and see what happens. See if the controller can view/interact with the drives. If it sees no drives, then my next paragraph is likely the case.
The disks sound like they might be in security locked state which will likely require a desktop SATA disk controller to remedy by attaching and manually removing from a security locked state. Megaraid controllers can't recognize security locked devices (most controllers and especially ones labeled "raid controllers" can't handle it) when in pass-through mode, but I've never heard of security lock commands actually getting through to the device with those controllers in pass-through mode. If the card was in raid mode to begin with, then it likely never did anything involving secure erase as the controller should not be offering that as a feature of provided disks to the OS.
Cheers, Arne
On 28.02.22 09:28, 韩光宇 wrote:
Hi Arne,
I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this.
And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem?
I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this?
Sorry that I have so much confuse.
love you, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote: > > Hi Guangyu, > > No worries about asking questions, this is what the mailing > list is for :) > > Just to clarify, you do not have to set root device hints, > it also works without (with the algorithm I mentioned). > However, hints help to define the exact device and/or make > deployment more predictable/repeatable. > > If it is really a driver problem, it is an issue with the > operating system of the image you use, i.e. CentOS8. Some > drivers were removed from 7 to 8, and we have seen issues > with specific drive models as well. > > You can try to build your own IPA images as described in > [1], e.g. to add your ssh key to be able to log into the > IPA to debug further, and to eventually include drivers > (if you can identify them and they are available for CentOS8). > > Another option may be to add another (newer) disk model to > the server, just to confirm it is the disk model/driver which > is the cause. > > You could also try to boot the node into a CentOS7 (and then > a CentOS8) live image to confirm it can see the disks at all. > > Hope this helps! > Arne > > [1] > https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html > > > On 10.02.22 11:15, 韩光宇 wrote: >> Hi Arne, >> >> Thank you very much for your response. Love you. You take away a lot >> of my confusion. >> >> You are right, I didn't set 'root device'. And Ironic also can not see >> disk, the content of the 'lsblk' file in the deploy los is emply. >> I tried to set 'root device', but because ironic can't find any disk, >> the deploy still filed. >> >> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 >> 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is >> inaccessible, /dev/disk/by-path/* version of block device name is >> unavailable Cause: [Errno 2] No such file or directory: >> '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or >> directory: '/dev/disk/by-path' >> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 >> 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that >> matches the root device hints {'wwn': '0x50014EE2691D724C'}: >> StopIteration >> >> Sorry to bother you, I'm a newcomer of Ironic and I didn't find >> information about it on google. >> >> The bare metal node have three same disk(Western Digital DC HA210 2TB >> SATA 6GB/s). Where I can confirm whether ironic-python-agent supports >> this disk? >> >> And If Ironic cannot find disk since the corresponding drivers in the >> IPA image are missing, do you know how to resolve it? I have used the >> latest deploy images in >> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ >> . Do I need to find and manually add driver in the source code or >> ramdisk(That was difficult tome)? >> >> Love you. >> >> Cheers, >> Guangyu >> >> Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道: >>> >>> Hi Guangyu, >>> >>> The error indicates that Ironic was not able to find >>> a device where it could deploy the image to. >>> >>> To find a device, Ironic will use 'root device' >>> hints [1], usually set by the admin on a node. If that >>> does not yield anything, Ironic will loop over all >>> block devices and pick the smallest which is larger >>> than 4GB (and order them alphabetically). >>> >>> If you have disks in your server which are larger than >>> 4GB, one potential explanation is that Ironic cannot see them, >>> e.g. since the corresponding drivers in the IPA image are missing. >>> The logs you posted seem to confirm something along those >>> lines. >>> >>> Check the content of the 'lsblk' file in the deploy logs which >>> you can find in the tar archive in /var/log/ironic/deploy/ >>> on the controller for your deployment attempt to see what >>> devices Ironic has access to. >>> >>> Cheers, >>> Arne >>> >>> >>> [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... >>> >>> On 10.02.22 02:50, 韩光宇 wrote: >>>> Dear all, >>>> >>>> I have a OpenStack Victoria environment, and tried to use ironic >>>> manage bare metal. But I got "- root device hints were not provided >>>> and all found block devices are smaller than 4294967296B." in deploy >>>> stage. >>>> >>>> 2022-02-09 17:57:56.492 3908982 ERROR >>>> ironic.drivers.modules.agent_base [-] Agent returned error for deploy >>>> step {'step': 'write_image', 'priority': 80, 'argsinfo': None, >>>> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : >>>> No suitable device was found for deployment - root device hints were >>>> not provided and all found block devices are smaller than >>>> 4294967296B.. >>>> >>>> I used "openstack server create --flavor my-baremetal-flavor --nic >>>> net-id=$net_id --image $image testing" to deploy bare metal node. I >>>> download deploy images(ipa-centos8-master.kernel and >>>> ipa-centos8-master.initramfs) in >>>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. >>>> >>>> The baremetal node info and flavor info as following: >>>> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ >>>> Ironic configure file as following: >>>> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ >>>> Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ >>>> The log of ironic-python-agent in bare metal node: >>>> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ >>>> >>>> I see some old discussion about this, such as: >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those >>>> discussions took place a long time ago, not version V, and no solution >>>> was seen. >>>> >>>> Does anyone know how to solve this problem? I would appreciate any >>>> kind of guidance or help. >>>> >>>> Thank you, >>>> Han Guangyu >>>> >
Seems like your issue is in raid controller configuration. And based on the MegaCli output, I doubt the disks are security locked. I would remove the disks, get into the bios/firmware menus, and try to delete them, then try to re-import the disks. Alternatively, it may be worthwhile to take the disks and attempt to use them in a desktop or something where a MegaRAID controller is not in the mix in case it is something with configuration on the disk as well. Examining outside of a MegaRAID controller would hopefully give you an idea of operational state as well. -Julia On Fri, Mar 4, 2022 at 1:25 AM 韩光宇 <hanguangyu2@gmail.com> wrote:
Hi Julia,
Sorry that my last email didn't reploy some question in your email. When I get into RAID config menu, I said disk state is "unconfig bad". And the more info is that Virtual Drive Management displayed "No Configuration Present!". But I cound not modify disk state in RAID confug menu. Even if I moved the disk to other server, and used the software in operating system to modify it, it still coudn't to be modify. such as: ```shell # MegaCli -PDList -a0
Adapter #0
Enclosure Device ID: 252 Slot Number: 1 Enclosure position: 0 Device Id: 9 WWN: 50014EE2BE72FEBF Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 0 KB [0x0 Sectors] Non Coerced Size: 0 KB [0x0 Sectors] Coerced Size: 0 KB [0x0 Sectors] Firmware state: Unconfigured(bad) Device Firmware Level: WA09 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x4433221101000000 Connected Port Number: 0(path0) Inquiry Data: ATA HGST HUS722T2TALWA09WCC6N4HZV9SX FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: Unknown Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :0C (32.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No
Enclosure Device ID: 252 Slot Number: 2 Enclosure position: 0 Device Id: 10 WWN: 50014EE2691D724C Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 0 KB [0x0 Sectors] Non Coerced Size: 0 KB [0x0 Sectors] Coerced Size: 0 KB [0x0 Sectors] Firmware state: Unconfigured(bad) Device Firmware Level: WA09 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x4433221102000000 Connected Port Number: 1(path0) Inquiry Data: ATA HGST HUS722T2TALWA09WCC6N4HZVTV5 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: Unknown Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :0C (32.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Disabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No
Enclosure Device ID: 252 Slot Number: 3 Enclosure position: 0 Device Id: 11 WWN: 50014EE2BE733A5E Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Firmware state: JBOD Device Firmware Level: WA09 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x4433221103000000 Connected Port Number: 2(path0) Inquiry Data: WCC6N0KX0HJD HGST HUS722T2TALA604 RAGNWA09 FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :34C (93.20 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's write cache : Enabled Drive's NCQ setting : Enabled Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No
Exit Code: 0x00 ``` ```shell # MegaCli -PDList -a0 | grep state Firmware state: Unconfigured(bad) Firmware state: Unconfigured(bad) Firmware state: JBOD # MegaCli -PDMakeGood -PhysDrv[252:2] -a0
Adapter: 0: Failed to change PD state at EnclId-252 SlotId-2.
Exit Code: 0x01 ```
And the "security locked state" is a idea that I continue. I will try to determine if it is in this state and find a way to disarm it.
Thank you very much Julia, cheers, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年3月1日周二 22:06写道:
On Mon, Feb 28, 2022 at 1:12 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
I am not aware of anything in the Ironic Python Agent that would remove disks from the system in a way that they would not be visible after a reboot (apart from, as mentioned before, the clean up of a hardware RAID in a way the IPA is not able to see any devices after).
How about trying to access and configure the hardware RAID with the corresponding tool from the RAM disk you booted into from the USB stick? Install the tool and see if it detects the controller.
The very first step before doing anything with Ironic is to get the disks back or understand why they are not visible.
Did cleaning fail at any given point with these machines?
If you have physical access, try disconnecting all of the drives, and then powering up the machine and see if you can get into the firmware configuration screen with control-h. If you can, remove all of the prior configuration or disk volumes. They will look like they are in error states most likely. If your unable to get into this screen, I would be worried about your disk controller card. If your able to clear everything out of the controller, power off, try re-inserting drives, and see what happens. See if the controller can view/interact with the drives. If it sees no drives, then my next paragraph is likely the case.
The disks sound like they might be in security locked state which will likely require a desktop SATA disk controller to remedy by attaching and manually removing from a security locked state. Megaraid controllers can't recognize security locked devices (most controllers and especially ones labeled "raid controllers" can't handle it) when in pass-through mode, but I've never heard of security lock commands actually getting through to the device with those controllers in pass-through mode. If the card was in raid mode to begin with, then it likely never did anything involving secure erase as the controller should not be offering that as a feature of provided disks to the OS.
Cheers, Arne
On 28.02.22 09:28, 韩光宇 wrote:
Hi Arne,
I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this.
And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem?
I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this?
Sorry that I have so much confuse.
love you, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
> > If the disk controllers *are* enumerated in the kernel log, which is > something to also look for, then the disks themselves may be in some > weird state like security locked. Generally this shows up as the > operating system kind of sees the disk and the SATA port connected but > can't really access it. This is also an exceptionally rare state to > find one's self in. > > More common, especially in enterprise grade hardware: If the disk > controller is actually a raid controller, and there are no raid > volumes configured, then the operating system likely cannot see the > underlying disks and turn that into a usable block device. I've seen a > couple drivers over the years which expose hints of disks in the > kernel log and without raid configuration in the cards, the drivers > can't present usable block devices to the operating system system. > > -Julia > > On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote: >> >> Hi Guangyu, >> >> No worries about asking questions, this is what the mailing >> list is for :) >> >> Just to clarify, you do not have to set root device hints, >> it also works without (with the algorithm I mentioned). >> However, hints help to define the exact device and/or make >> deployment more predictable/repeatable. >> >> If it is really a driver problem, it is an issue with the >> operating system of the image you use, i.e. CentOS8. Some >> drivers were removed from 7 to 8, and we have seen issues >> with specific drive models as well. >> >> You can try to build your own IPA images as described in >> [1], e.g. to add your ssh key to be able to log into the >> IPA to debug further, and to eventually include drivers >> (if you can identify them and they are available for CentOS8). >> >> Another option may be to add another (newer) disk model to >> the server, just to confirm it is the disk model/driver which >> is the cause. >> >> You could also try to boot the node into a CentOS7 (and then >> a CentOS8) live image to confirm it can see the disks at all. >> >> Hope this helps! >> Arne >> >> [1] >> https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html >> >> >> On 10.02.22 11:15, 韩光宇 wrote: >>> Hi Arne, >>> >>> Thank you very much for your response. Love you. You take away a lot >>> of my confusion. >>> >>> You are right, I didn't set 'root device'. And Ironic also can not see >>> disk, the content of the 'lsblk' file in the deploy los is emply. >>> I tried to set 'root device', but because ironic can't find any disk, >>> the deploy still filed. >>> >>> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 >>> 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is >>> inaccessible, /dev/disk/by-path/* version of block device name is >>> unavailable Cause: [Errno 2] No such file or directory: >>> '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or >>> directory: '/dev/disk/by-path' >>> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 >>> 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that >>> matches the root device hints {'wwn': '0x50014EE2691D724C'}: >>> StopIteration >>> >>> Sorry to bother you, I'm a newcomer of Ironic and I didn't find >>> information about it on google. >>> >>> The bare metal node have three same disk(Western Digital DC HA210 2TB >>> SATA 6GB/s). Where I can confirm whether ironic-python-agent supports >>> this disk? >>> >>> And If Ironic cannot find disk since the corresponding drivers in the >>> IPA image are missing, do you know how to resolve it? I have used the >>> latest deploy images in >>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ >>> . Do I need to find and manually add driver in the source code or >>> ramdisk(That was difficult tome)? >>> >>> Love you. >>> >>> Cheers, >>> Guangyu >>> >>> Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道: >>>> >>>> Hi Guangyu, >>>> >>>> The error indicates that Ironic was not able to find >>>> a device where it could deploy the image to. >>>> >>>> To find a device, Ironic will use 'root device' >>>> hints [1], usually set by the admin on a node. If that >>>> does not yield anything, Ironic will loop over all >>>> block devices and pick the smallest which is larger >>>> than 4GB (and order them alphabetically). >>>> >>>> If you have disks in your server which are larger than >>>> 4GB, one potential explanation is that Ironic cannot see them, >>>> e.g. since the corresponding drivers in the IPA image are missing. >>>> The logs you posted seem to confirm something along those >>>> lines. >>>> >>>> Check the content of the 'lsblk' file in the deploy logs which >>>> you can find in the tar archive in /var/log/ironic/deploy/ >>>> on the controller for your deployment attempt to see what >>>> devices Ironic has access to. >>>> >>>> Cheers, >>>> Arne >>>> >>>> >>>> [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... >>>> >>>> On 10.02.22 02:50, 韩光宇 wrote: >>>>> Dear all, >>>>> >>>>> I have a OpenStack Victoria environment, and tried to use ironic >>>>> manage bare metal. But I got "- root device hints were not provided >>>>> and all found block devices are smaller than 4294967296B." in deploy >>>>> stage. >>>>> >>>>> 2022-02-09 17:57:56.492 3908982 ERROR >>>>> ironic.drivers.modules.agent_base [-] Agent returned error for deploy >>>>> step {'step': 'write_image', 'priority': 80, 'argsinfo': None, >>>>> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : >>>>> No suitable device was found for deployment - root device hints were >>>>> not provided and all found block devices are smaller than >>>>> 4294967296B.. >>>>> >>>>> I used "openstack server create --flavor my-baremetal-flavor --nic >>>>> net-id=$net_id --image $image testing" to deploy bare metal node. I >>>>> download deploy images(ipa-centos8-master.kernel and >>>>> ipa-centos8-master.initramfs) in >>>>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. >>>>> >>>>> The baremetal node info and flavor info as following: >>>>> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ >>>>> Ironic configure file as following: >>>>> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ >>>>> Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ >>>>> The log of ironic-python-agent in bare metal node: >>>>> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ >>>>> >>>>> I see some old discussion about this, such as: >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those >>>>> discussions took place a long time ago, not version V, and no solution >>>>> was seen. >>>>> >>>>> Does anyone know how to solve this problem? I would appreciate any >>>>> kind of guidance or help. >>>>> >>>>> Thank you, >>>>> Han Guangyu >>>>> >>
participants (3)
-
Arne Wiebalck
-
Julia Kreger
-
韩光宇