[Ironic] No suitable device was found for deployment

Arne Wiebalck arne.wiebalck at cern.ch
Fri Mar 4 08:37:19 UTC 2022


Hi Guangyu,

I would think cleaning succeeds even if there are no disks:
the loop to clean the disks is simply empty, so nothing to
do, success! :) Deployment then fails since it needs a disk
to deploy on.

For my understanding:
You reconfigured the disks into JBOD state and then retried
to deploy (which failed and the disks fell back into UB state)?

JBOD mode should work, but is usually not the main mode h/w
RAID controllers work in. One thing to try is to actually
configure a RAID-0 or RAID-1 device from your three disks and
retry to deploy.

I am not totally sure if Ironic would try to remove such a h/w
RAID config during cleaning, but Julia will be able to tell.

Cheers,
  Arne

On 04.03.22 08:54, 韩光宇 wrote:
> Hi Julia and Arne,
> 
>> Did cleaning fail at any given point with these machines?
> Sorry that I didn't describe it clearly. Actually, clean is alway
> success in ironic log, deploying is failed.  I just wonder if the
> cleaning phase did something that caused disk identification problems.
> 
> And I have found the RAID config menu. In my machine, I need to prase
> "Ctrl + R"  when  RAID interface appear in machine boot.
> Thank you very much~! In RAID config menu, I found that the state of
> three disk is UB(unconfig bad).
> 
> So, if I use the Ironic service to install an operating system for a
> server that has three hard disks in the 'JBOD' state, is there
> anything I should pay attention to or operate? If I don't do something
> for this, deploying stage give me an error 'No suitable device was
> found for deployment' and 'lslbk' is empty. After clean successed and
> deploy faided, the disk state is "Unconfigured bad".
> 
> best wishes to you,
> Han Guangyu
> 
> Julia Kreger <juliaashleykreger at gmail.com> 于2022年3月1日周二 22:06写道:
> 
> 
>>
>> On Mon, Feb 28, 2022 at 1:12 AM Arne Wiebalck <arne.wiebalck at cern.ch> wrote:
>>>
>>> Hi Guangyu,
>>>
>>> I am not aware of anything in the Ironic Python Agent that
>>> would remove disks from the system in a way that they would
>>> not be visible after a reboot (apart from, as mentioned before,
>>> the clean up of a hardware RAID in a way the IPA is not able
>>> to see any devices after).
>>>
>>> How about trying to access and configure the hardware RAID with
>>> the corresponding tool from the RAM disk you booted into from the
>>> USB stick? Install the tool and see if it detects the controller.
>>>
>>> The very first step before doing anything with Ironic is to
>>> get the disks back or understand why they are not visible.
>>>
>>
>> Did cleaning fail at any given point with these machines?
>>
>> If you have physical access, try disconnecting all of the drives, and
>> then powering up the machine and see if you can get into the firmware
>> configuration screen with control-h. If you can, remove all of the
>> prior configuration or disk volumes. They will look like they are in
>> error states most likely. If your unable to get into this screen, I
>> would be worried about your disk controller card. If your able to
>> clear everything out of the controller, power off, try re-inserting
>> drives, and see what happens. See if the controller can view/interact
>> with the drives. If it sees no drives, then my next paragraph is
>> likely the case.
>>
>> The disks sound like they might be in security locked state which will
>> likely require a desktop SATA disk controller to remedy by attaching
>> and manually removing from a security locked state. Megaraid
>> controllers can't recognize security locked devices (most controllers
>> and especially ones labeled "raid controllers" can't handle it) when
>> in pass-through mode, but I've never heard of security lock commands
>> actually getting through to the device with those controllers in
>> pass-through mode. If the card was in raid mode to begin with, then it
>> likely never did anything involving secure erase as the controller
>> should not be offering that as a feature of provided disks to the OS.
>>
>>> Cheers,
>>>    Arne
>>>
>>> On 28.02.22 09:28, 韩光宇 wrote:
>>>> Hi Arne,
>>>>
>>>> I didn't find hardware RAID config option during the initial boot
>>>> sequence. Ctrl+H is unresponsive in this computer. I just saw "Press
>>>> Del to enter firmware configuration, press F3 to enter boot menu, and
>>>> press F12 to enter network boot". And I press 'Del' to enter the BIOS.
>>>> But I didn't find RAID config menu in BIOS. Sorry that I have poor
>>>> knowledge about this.
>>>>
>>>> And now, even though I installed the operating system manually using a
>>>> USB stick, I still couldn't find the hard drive. Is there anything
>>>> that ironic-agent did in the clean phase that would have caused this
>>>> problem?
>>>>
>>>> I wonder if this is a thinking pointto solve the problem. Now, my idea
>>>> is to first find a way to manually configure RAID.  Do you agree with
>>>> this?  And than, whether RAID configurations are still cleared in the
>>>> Clean phase if clean phase will do this?
>>>>
>>>> Sorry that I have so much confuse.
>>>>
>>>> love you,
>>>> Guangyu
>>>>
>>>> Arne Wiebalck <arne.wiebalck at cern.ch> 于2022年2月14日周一 15:59写道:
>>>>>
>>>>> Hi Guangyu,
>>>>>
>>>>> It seems like Julia had the right idea and the disks
>>>>> are not visible since the RAID controller does not
>>>>> expose anything to the operating system. This seems
>>>>> to be confirmed by you booting into the CentOS7 image.
>>>>>
>>>>> What I would suggest to try next is to look for the
>>>>> hardware RAID config option during the initial boot
>>>>> sequence to enter the RAID config menu (there should be
>>>>> a message quite early on, and maybe Ctrl-H is needed
>>>>> to enter the menu).
>>>>>
>>>>> Once there, manually configure the disks as JBODs or
>>>>> create a RAID device. Upon reboot this should be visible
>>>>> and accessible as a device. Maybe check from your CentOS7
>>>>> image again. If the devices are there, Ironic should
>>>>> also be able to deploy on them (for this you can remove
>>>>> the RAID config you added).
>>>>>
>>>>> It depends a little on what your goal is, but I would
>>>>> try this first to see if you can make a device visible
>>>>> and if the Ironic deploy bit works, before trying to
>>>>> configure the hardware RAID via Ironic.
>>>>>
>>>>> Cheers,
>>>>>     Arne
>>>>>
>>>>> On 14.02.22 03:20, 韩光宇 wrote:
>>>>>> Hi Arne and Julia,
>>>>>>
>>>>>> You make me feel so warm. Best wishes to you.
>>>>>>
>>>>>> I have tried to boot the node into a CentOS7, but it still coundnot to
>>>>>> find disk. And sorry that I didn't notice the RAID card.
>>>>>>
>>>>>> # lspci -v
>>>>>> ...
>>>>>> 23:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108
>>>>>> [Invader] (rev 02)
>>>>>>            Subsystem: Broadcom / LSI MegaRAID SAS 9361-8i
>>>>>>            Flags: bus master, fast devsel, latency 0, IRQ -2147483648, NUMA node 1
>>>>>>            I/O ports at 3000 [size=256]
>>>>>>            Memory at e9900000 (64-bit, non-prefetchable) [size=64K]
>>>>>>            Memory at e9700000 (64-bit, non-prefetchable) [size=1M]
>>>>>>            Expansion ROM at e9800000 [disabled] [size=1M]
>>>>>>            Capabilities: [50] Power Management version 3
>>>>>>            Capabilities: [68] Express Endpoint, MSI 00
>>>>>>            Capabilities: [d0] Vital Product Data
>>>>>>            Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
>>>>>>            Capabilities: [c0] MSI-X: Enable+ Count=97 Masked-
>>>>>>            Capabilities: [100] Advanced Error Reporting
>>>>>>            Capabilities: [1e0] #19
>>>>>>            Capabilities: [1c0] Power Budgeting <?>
>>>>>>            Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
>>>>>>            Kernel driver in use: megaraid_sas
>>>>>>            Kernel modules: megaraid_sas
>>>>>> ...
>>>>>>
>>>>>> I try to config raid fallowing
>>>>>> https://docs.openstack.org/ironic/latest/admin/raid.html
>>>>>> by `baremetal node set $NODE_UUID --target-raid-config raid.json`. The
>>>>>> server have  three same disk(Western Digital DC HA210 2TB SATA 6GB/s)
>>>>>> # cat raid.json
>>>>>> {
>>>>>>      "logical_disks": [
>>>>>>        {
>>>>>>          "size_gb": "MAX",
>>>>>>          "raid_level": "0",
>>>>>>          "is_root_volume": true
>>>>>>        }
>>>>>>      ]
>>>>>> }
>>>>>>
>>>>>> But Ironic still coundn't see disk. I still got
>>>>>> ```
>>>>>> ## In deploy images
>>>>>> # journalctl -fxeu ironic-python-agent
>>>>>> Feb 14 02:17:22 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14
>>>>>> 02:17:22.863 2329 WARNING root [-] Path /dev/disk/by-path is
>>>>>> inaccessible, /dev/disk/by-path/* version of block device name is
>>>>>> unavailable Cause: [Errno 2] No such file or directory:
>>>>>> '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or
>>>>>> directory: '/dev/disk/by-path'
>>>>>> Feb 14 02:17:44 host-10-12-22-74 ironic-python-agent[2329]: 2022-02-14
>>>>>> 02:17:44.391 2329 ERROR root [-] Unexpected error dispatching
>>>>>> get_os_install_device to manager
>>>>>> <ironic_python_agent.hardware.GenericHardwareManager object at
>>>>>> 0x7efbf4da2208>: Error finding the disk or partition device to deploy
>>>>>> the image onto: No suitable device was found for deployment - root
>>>>>> device hints were not provided and all found block devices are smaller
>>>>>> than 4294967296B.: ironic_python_agent.errors.DeviceNotFound: Error
>>>>>> finding the disk or partition device to deploy the image onto: No
>>>>>> suitable device was found for deployment - root device hints were not
>>>>>> provided and all found block devices are smaller than 4294967296B.
>>>>>> ```
>>>>>>
>>>>>> I don't know if it's a lack of a RAID card driver or a lack of a disk
>>>>>> driver or a lack of RAID configuration. Could you have some idea about
>>>>>> this question?
>>>>>>
>>>>>> love you,
>>>>>> Han Guangyu
>>>>>>
>>>>>>
>>>>>> Julia Kreger <juliaashleykreger at gmail.com> 于2022年2月10日周四 23:11写道:
>>>>>>
>>>>>>>
>>>>>>> If the disk controllers *are* enumerated in the kernel log, which is
>>>>>>> something to also look for, then the disks themselves may be in some
>>>>>>> weird state like security locked. Generally this shows up as the
>>>>>>> operating system kind of sees the disk and the SATA port connected but
>>>>>>> can't really access it. This is also an exceptionally rare state to
>>>>>>> find one's self in.
>>>>>>>
>>>>>>> More common, especially in enterprise grade hardware: If the disk
>>>>>>> controller is actually a raid controller, and there are no raid
>>>>>>> volumes configured, then the operating system likely cannot see the
>>>>>>> underlying disks and turn that into a usable block device. I've seen a
>>>>>>> couple drivers over the years which expose hints of disks in the
>>>>>>> kernel log and without raid configuration in the cards, the drivers
>>>>>>> can't present usable block devices to the operating system system.
>>>>>>>
>>>>>>> -Julia
>>>>>>>
>>>>>>> On Thu, Feb 10, 2022 at 3:17 AM Arne Wiebalck <arne.wiebalck at cern.ch> wrote:
>>>>>>>>
>>>>>>>> Hi Guangyu,
>>>>>>>>
>>>>>>>> No worries about asking questions, this is what the mailing
>>>>>>>> list is for :)
>>>>>>>>
>>>>>>>> Just to clarify, you do not have to set root device hints,
>>>>>>>> it also works without (with the algorithm I mentioned).
>>>>>>>> However, hints help to define the exact device and/or make
>>>>>>>> deployment more predictable/repeatable.
>>>>>>>>
>>>>>>>> If it is really a driver problem, it is an issue with the
>>>>>>>> operating system of the image you use, i.e. CentOS8. Some
>>>>>>>> drivers were removed from 7 to 8, and we have seen issues
>>>>>>>> with specific drive models as well.
>>>>>>>>
>>>>>>>> You can try to build your own IPA images as described in
>>>>>>>> [1], e.g. to add your ssh key to be able to log into the
>>>>>>>> IPA to debug further, and to eventually include drivers
>>>>>>>> (if you can identify them and they are available for CentOS8).
>>>>>>>>
>>>>>>>> Another option may be to add another (newer) disk model to
>>>>>>>> the server, just to confirm it is the disk model/driver which
>>>>>>>> is the cause.
>>>>>>>>
>>>>>>>> You could also try to boot the node into a CentOS7 (and then
>>>>>>>> a CentOS8) live image to confirm it can see the disks at all.
>>>>>>>>
>>>>>>>> Hope this helps!
>>>>>>>>      Arne
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://docs.openstack.org/ironic-python-agent-builder/latest/admin/dib.html
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10.02.22 11:15, 韩光宇 wrote:
>>>>>>>>> Hi Arne,
>>>>>>>>>
>>>>>>>>> Thank you very much for your response. Love you. You take away a lot
>>>>>>>>> of my confusion.
>>>>>>>>>
>>>>>>>>> You are right, I didn't set 'root device'. And Ironic also can not see
>>>>>>>>> disk, the content of the 'lsblk' file in the deploy los is emply.
>>>>>>>>> I tried to set 'root device', but because ironic can't find any disk,
>>>>>>>>> the deploy still filed.
>>>>>>>>>
>>>>>>>>> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10
>>>>>>>>> 09:51:55.045 2324 WARNING root [-] Path /dev/disk/by-path is
>>>>>>>>> inaccessible, /dev/disk/by-path/* version of block device name is
>>>>>>>>> unavailable Cause: [Errno 2] No such file or directory:
>>>>>>>>> '/dev/disk/by-path': FileNotFoundError: [Errno 2] No such file or
>>>>>>>>> directory: '/dev/disk/by-path'
>>>>>>>>> Feb 10 09:51:55 host-10-12-22-59 ironic-python-agent[2324]: 2022-02-10
>>>>>>>>> 09:51:55.056 2324 WARNING ironic_lib.utils [-] No device found that
>>>>>>>>> matches the root device hints {'wwn': '0x50014EE2691D724C'}:
>>>>>>>>> StopIteration
>>>>>>>>>
>>>>>>>>> Sorry to bother you, I'm a newcomer of Ironic and I didn't find
>>>>>>>>> information about it on google.
>>>>>>>>>
>>>>>>>>> The bare metal node have three same disk(Western Digital DC HA210 2TB
>>>>>>>>> SATA 6GB/s). Where I can confirm whether ironic-python-agent supports
>>>>>>>>> this disk?
>>>>>>>>>
>>>>>>>>> And If Ironic cannot find disk since the corresponding drivers in the
>>>>>>>>> IPA image are missing, do you know how to resolve it? I have used the
>>>>>>>>> latest deploy images in
>>>>>>>>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/
>>>>>>>>> .  Do I need to find and manually add driver in the source code or
>>>>>>>>> ramdisk(That was difficult tome)?
>>>>>>>>>
>>>>>>>>> Love you.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Guangyu
>>>>>>>>>
>>>>>>>>> Arne Wiebalck <arne.wiebalck at cern.ch> 于2022年2月10日周四 15:51写道:
>>>>>>>>>>
>>>>>>>>>> Hi Guangyu,
>>>>>>>>>>
>>>>>>>>>> The error indicates that Ironic was not able to find
>>>>>>>>>> a device where it could deploy the image to.
>>>>>>>>>>
>>>>>>>>>> To find a device, Ironic will use 'root device'
>>>>>>>>>> hints [1], usually set by the admin on a node. If that
>>>>>>>>>> does not yield anything, Ironic will loop over all
>>>>>>>>>> block devices and pick the smallest which is larger
>>>>>>>>>> than 4GB (and order them alphabetically).
>>>>>>>>>>
>>>>>>>>>> If you have disks in your server which are larger than
>>>>>>>>>> 4GB, one potential explanation is that Ironic cannot see them,
>>>>>>>>>> e.g. since the corresponding drivers in the IPA image are missing.
>>>>>>>>>> The logs you posted seem to confirm something along those
>>>>>>>>>> lines.
>>>>>>>>>>
>>>>>>>>>> Check the content of the 'lsblk' file in the deploy logs which
>>>>>>>>>> you can find in the tar archive in /var/log/ironic/deploy/
>>>>>>>>>> on the controller for your deployment attempt to see what
>>>>>>>>>> devices Ironic has access to.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>       Arne
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] https://docs.openstack.org/ironic/latest/install/advanced.html#root-device-hints
>>>>>>>>>>
>>>>>>>>>> On 10.02.22 02:50, 韩光宇 wrote:
>>>>>>>>>>> Dear all,
>>>>>>>>>>>
>>>>>>>>>>> I have a OpenStack Victoria environment, and tried to use ironic
>>>>>>>>>>> manage bare metal. But I got "- root device hints were not provided
>>>>>>>>>>> and all found block devices are smaller than 4294967296B." in deploy
>>>>>>>>>>> stage.
>>>>>>>>>>>
>>>>>>>>>>> 2022-02-09 17:57:56.492 3908982 ERROR
>>>>>>>>>>> ironic.drivers.modules.agent_base [-] Agent returned error for deploy
>>>>>>>>>>> step {'step': 'write_image', 'priority': 80, 'argsinfo': None,
>>>>>>>>>>> 'interface': 'deploy'} on node cc68c450-ce54-4e1c-be04-8b0a6169ef92 :
>>>>>>>>>>> No suitable device was found for deployment - root device hints were
>>>>>>>>>>> not provided and all found block devices are smaller than
>>>>>>>>>>> 4294967296B..
>>>>>>>>>>>
>>>>>>>>>>> I used "openstack server create --flavor my-baremetal-flavor --nic
>>>>>>>>>>> net-id=$net_id --image $image testing" to deploy bare metal node.  I
>>>>>>>>>>> download deploy images(ipa-centos8-master.kernel and
>>>>>>>>>>> ipa-centos8-master.initramfs) in
>>>>>>>>>>> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/.
>>>>>>>>>>>
>>>>>>>>>>> The baremetal node info and flavor info as following:
>>>>>>>>>>> https://paste.opendev.org/show/bV7lgO6RkNQY6ZGPbT2e/
>>>>>>>>>>> Ironic configure file as following:
>>>>>>>>>>> https://paste.opendev.org/show/bTgY9Kpn7KWqwQl73aEa/
>>>>>>>>>>> Ironic-conductor log:    https://paste.opendev.org/show/bFKZYlXmccxNxU8lEogk/
>>>>>>>>>>> The log of ironic-python-agent in bare metal node:
>>>>>>>>>>> https://paste.opendev.org/show/btAuaMuV2IutV2Pa7YIa/
>>>>>>>>>>>
>>>>>>>>>>> I see some old discussion about this, such as:
>>>>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1312187. But those
>>>>>>>>>>> discussions took place a long time ago, not version V, and no solution
>>>>>>>>>>> was seen.
>>>>>>>>>>>
>>>>>>>>>>> Does anyone know how to solve this problem? I would appreciate any
>>>>>>>>>>> kind of guidance or help.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Han Guangyu
>>>>>>>>>>>
>>>>>>>>



More information about the openstack-discuss mailing list