Hi Guangyu, I am not aware of anything in the Ironic Python Agent that would remove disks from the system in a way that they would not be visible after a reboot (apart from, as mentioned before, the clean up of a hardware RAID in a way the IPA is not able to see any devices after). How about trying to access and configure the hardware RAID with the corresponding tool from the RAM disk you booted into from the USB stick? Install the tool and see if it detects the controller. The very first step before doing anything with Ironic is to get the disks back or understand why they are not visible. Cheers, Arne On 28.02.22 09:28, 韩光宇 wrote:
Hi Arne,
I didn't find hardware RAID config option during the initial boot sequence. Ctrl+H is unresponsive in this computer. I just saw "Press Del to enter firmware configuration, press F3 to enter boot menu, and press F12 to enter network boot". And I press 'Del' to enter the BIOS. But I didn't find RAID config menu in BIOS. Sorry that I have poor knowledge about this.
And now, even though I installed the operating system manually using a USB stick, I still couldn't find the hard drive. Is there anything that ironic-agent did in the clean phase that would have caused this problem?
I wonder if this is a thinking pointto solve the problem. Now, my idea is to first find a way to manually configure RAID. Do you agree with this? And than, whether RAID configurations are still cleared in the Clean phase if clean phase will do this?
Sorry that I have so much confuse.
love you, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月14日周一 15:59写道:
Hi Guangyu,
It seems like Julia had the right idea and the disks are not visible since the RAID controller does not expose anything to the operating system. This seems to be confirmed by you booting into the CentOS7 image.
What I would suggest to try next is to look for the hardware RAID config option during the initial boot sequence to enter the RAID config menu (there should be a message quite early on, and maybe Ctrl-H is needed to enter the menu).
Once there, manually configure the disks as JBODs or create a RAID device. Upon reboot this should be visible and accessible as a device. Maybe check from your CentOS7 image again. If the devices are there, Ironic should also be able to deploy on them (for this you can remove the RAID config you added).
It depends a little on what your goal is, but I would try this first to see if you can make a device visible and if the Ironic deploy bit works, before trying to configure the hardware RAID via Ironic.
Cheers, Arne
On 14.02.22 03:20, 韩光宇 wrote:
Hi Arne and Julia,
You make me feel so warm. Best wishes to you.
I have tried to boot the node into a CentOS7, but it still coundnot to find disk. And sorry that I didn't notice the RAID card.
# lspci -v ... 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02) Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1 I/O ports at 3000 [size=256] Memory at e9900000 (64-bit, non-prefetchable) [size=64K] Memory at e9700000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at e9800000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas ...
I try to config raid fallowing https://docs.openstack.org/ironic/latest/admin/raid.html by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The server have three same disk(Western Digital DC HA210 2TB SATA 6GB/s) # cat raid.json { "logical_disks": [ { "size_gb": "MAX", "raid_level": "0", "is_root_volume": true } ] }
But Ironic still coundn't see disk. I still got ``` ## In deploy images # journalctl -fxeu ironic-python-agent Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching get_os_install_device to manager <ironic_python_agent.hardware.GenericHardwareManager object at 0x7efbf4da2208>: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment - root device hints were not provided and all found block devices are smaller than 4294967296B. ```
I don't know if it's a lack of a RAID card driver or a lack of a disk driver or a lack of RAID configuration. Could you have some idea about this question?
love you, Han Guangyu
Julia Kreger <juliaashleykreger@gmail.com> 于2022年2月10日周四 23:11写道:
If the disk controllers *are* enumerated in the kernel log, which is something to also look for, then the disks themselves may be in some weird state like security locked. Generally this shows up as the operating system kind of sees the disk and the SATA port connected but can't really access it. This is also an exceptionally rare state to find one's self in.
More common, especially in enterprise grade hardware: If the disk controller is actually a raid controller, and there are no raid volumes configured, then the operating system likely cannot see the underlying disks and turn that into a usable block device. I've seen a couple drivers over the years which expose hints of disks in the kernel log and without raid configuration in the cards, the drivers can't present usable block devices to the operating system system.
-Julia
On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck@cern.ch> wrote:
Hi Guangyu,
No worries about asking questions, this is what the mailing list is for :)
Just to clarify, you do not have to set root device hints, it also works without (with the algorithm I mentioned). However, hints help to define the exact device and/or make deployment more predictable/repeatable.
If it is really a driver problem, it is an issue with the operating system of the image you use, i.e. CentOS8. Some drivers were removed from 7 to 8, and we have seen issues with specific drive models as well.
You can try to build your own IPA images as described in [1], e.g. to add your ssh key to be able to log into the IPA to debug further, and to eventually include drivers (if you can identify them and they are available for CentOS8).
Another option may be to add another (newer) disk model to the server, just to confirm it is the disk model/driver which is the cause.
You could also try to boot the node into a CentOS7 (and then a CentOS8) live image to confirm it can see the disks at all.
Hope this helps! Arne
[1] https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
On 10.02.22 11:15, 韩光宇 wrote:
Hi Arne,
Thank you very much for your response. Love you. You take away a lot of my confusion.
You are right, I didn't set 'root device'. And Ironic also can not see disk, the content of the 'lsblk' file in the deploy los is emply. I tried to set 'root device', but because ironic can't find any disk, the deploy still filed.
Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is inaccessible, /dev/disk/by-path/* version of block device name is unavailable Cause: [Errno 2] No such file or directory: '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-path' Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that matches the root device hints {'wwn': '0x50014EE2691D724C'}: StopIteration
Sorry to bother you, I'm a newcomer of Ironic and I didn't find information about it on google.
The bare metal node have three same disk(Western Digital DC HA210 2TB SATA 6GB/s). Where I can confirm whether ironic-python-agent supports this disk?
And If Ironic cannot find disk since the corresponding drivers in the IPA image are missing, do you know how to resolve it? I have used the latest deploy images in https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/ . Do I need to find and manually add driver in the source code or ramdisk(That was difficult tome)?
Love you.
Cheers, Guangyu
Arne Wiebalck <arne.wiebalck@cern.ch> 于2022年2月10日周四 15:51写道: > > Hi Guangyu, > > The error indicates that Ironic was not able to find > a device where it could deploy the image to. > > To find a device, Ironic will use 'root device' > hints [1], usually set by the admin on a node. If that > does not yield anything, Ironic will loop over all > block devices and pick the smallest which is larger > than 4GB (and order them alphabetically). > > If you have disks in your server which are larger than > 4GB, one potential explanation is that Ironic cannot see them, > e.g. since the corresponding drivers in the IPA image are missing. > The logs you posted seem to confirm something along those > lines. > > Check the content of the 'lsblk' file in the deploy logs which > you can find in the tar archive in /var/log/ironic/deploy/ > on the controller for your deployment attempt to see what > devices Ironic has access to. > > Cheers, > Arne > > > [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-h... > > On 10.02.22 02:50, 韩光宇 wrote: >> Dear all, >> >> I have a OpenStack Victoria environment, and tried to use ironic >> manage bare metal. But I got "- root device hints were not provided >> and all found block devices are smaller than 4294967296B." in deploy >> stage. >> >> 2022-02-09 17:57:56.492 3908982 ERROR >> ironic.drivers.modules.agent_base [-] Agent returned error for deploy >> step {'step': 'write_image', 'priority': 80, 'argsinfo': None, >> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 : >> No suitable device was found for deployment - root device hints were >> not provided and all found block devices are smaller than >> 4294967296B.. >> >> I used "openstack server create --flavor my-baremetal-flavor --nic >> net-id=$net_id --image $image testing" to deploy bare metal node. I >> download deploy images(ipa-centos8-master.kernel and >> ipa-centos8-master.initramfs) in >> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/. >> >> The baremetal node info and flavor info as following: >> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/ >> Ironic configure file as following: >> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/ >> Ironic-conductor log: https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/ >> The log of ironic-python-agent in bare metal node: >> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/ >> >> I see some old discussion about this, such as: >> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those >> discussions took place a long time ago, not version V, and no solution >> was seen. >> >> Does anyone know how to solve this problem? I would appreciate any >> kind of guidance or help. >> >> Thank you, >> Han Guangyu >>