[Kolla-Ansible][Ironic] Baremetal node Cleaning fails in UEFI mode, but succeeds in Legacy Bios Mode
Hi Team, I am trying to provision Baremetal node using IRONIC with KOLLA-ANSIBLE. I have enabled the support of "IPXE" in kolla-ansible as well. I am getting an issue that when my baremetal node is booted in UEFI Mode, it is not able to find file *"ipxe.efi"* as a result of which cleaning of the node fails [image: image.png] But when I change the BIOS Mode of my Baremetal Node to Legacy.BIOS, it looks for the file "*undionly.kpxe"* for which the acknowledgment is received and Data Packets are transferred. Eventually the cleaning of node is also a success. [image: image.png] Is there any limitation of IRONIC or KOLLA-ANSIBLE side that provisioning of Baremetal Node can only be done in Legacy Bios Mode? For bare metal provisioning in UEFI mode, is there any other parameter that needs to be enabled. Regards Anirudh Gupta
Hi Anirudh, Are you using CentOS 8? The iPXE EFI bootloader file is named ipxe-x86_64.efi there, so a TFTP request for ipxe.efi will fail. Could you try setting the following in ironic.conf: [pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi If this works, we should change it in kolla-ansible. Would you be able to propose the change via Gerrit? Mark On Fri, 13 Aug 2021 at 17:18, Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Team,
I am trying to provision Baremetal node using IRONIC with KOLLA-ANSIBLE. I have enabled the support of "IPXE" in kolla-ansible as well.
I am getting an issue that when my baremetal node is booted in UEFI Mode, it is not able to find file *"ipxe.efi"* as a result of which cleaning of the node fails
[image: image.png]
But when I change the BIOS Mode of my Baremetal Node to Legacy.BIOS, it looks for the file "*undionly.kpxe"* for which the acknowledgment is received and Data Packets are transferred. Eventually the cleaning of node is also a success.
[image: image.png]
Is there any limitation of IRONIC or KOLLA-ANSIBLE side that provisioning of Baremetal Node can only be done in Legacy Bios Mode? For bare metal provisioning in UEFI mode, is there any other parameter that needs to be enabled.
Regards Anirudh Gupta
Hi Mark, Thanks for your reply. Yes, I am using Centos 8 only. I tried changing the settings and restarted the docker container. The cleaning process moved a step further but it started showing the error *"Could not select: Exec format not supported"* [image: image.png] Regards Anirudh Gupta On Mon, Aug 16, 2021 at 1:52 PM Mark Goddard <mark@stackhpc.com> wrote:
Hi Anirudh,
Are you using CentOS 8? The iPXE EFI bootloader file is named ipxe-x86_64.efi there, so a TFTP request for ipxe.efi will fail.
Could you try setting the following in ironic.conf:
[pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi
If this works, we should change it in kolla-ansible. Would you be able to propose the change via Gerrit?
Mark
On Fri, 13 Aug 2021 at 17:18, Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Team,
I am trying to provision Baremetal node using IRONIC with KOLLA-ANSIBLE. I have enabled the support of "IPXE" in kolla-ansible as well.
I am getting an issue that when my baremetal node is booted in UEFI Mode, it is not able to find file *"ipxe.efi"* as a result of which cleaning of the node fails
[image: image.png]
But when I change the BIOS Mode of my Baremetal Node to Legacy.BIOS, it looks for the file "*undionly.kpxe"* for which the acknowledgment is received and Data Packets are transferred. Eventually the cleaning of node is also a success.
[image: image.png]
Is there any limitation of IRONIC or KOLLA-ANSIBLE side that provisioning of Baremetal Node can only be done in Legacy Bios Mode? For bare metal provisioning in UEFI mode, is there any other parameter that needs to be enabled.
Regards Anirudh Gupta
Hi Mark, There was some issue with the cleaning image due to which the issue reported in previous conversation was observed. This was successfully resolved. By setting the parameter in ironic.conf file [pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi The "node provide" command successfully executed and the node came in "available" state. *In Legacy:* When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure is that the node will install userimage over the network and then will be rebooted After the reboot, it will boot up with the Hard disk and with the OS specified in userimage. *In UEFI:* When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure of installing user image and rebooting remains the same. But After the reboot, despite setting the hard disk as the first priority, it again starts booting over the network and eventually fails. I have also tried passing the *capabilities='boot_option:local' *both in baremetal node and flavor, but the behaviour is the same. Regards Anirudh Gupta On Mon, Aug 16, 2021 at 2:54 PM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Mark,
Thanks for your reply. Yes, I am using Centos 8 only.
I tried changing the settings and restarted the docker container.
The cleaning process moved a step further but it started showing the error *"Could not select: Exec format not supported"*
[image: image.png]
Regards Anirudh Gupta
On Mon, Aug 16, 2021 at 1:52 PM Mark Goddard <mark@stackhpc.com> wrote:
Hi Anirudh,
Are you using CentOS 8? The iPXE EFI bootloader file is named ipxe-x86_64.efi there, so a TFTP request for ipxe.efi will fail.
Could you try setting the following in ironic.conf:
[pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi
If this works, we should change it in kolla-ansible. Would you be able to propose the change via Gerrit?
Mark
On Fri, 13 Aug 2021 at 17:18, Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Team,
I am trying to provision Baremetal node using IRONIC with KOLLA-ANSIBLE. I have enabled the support of "IPXE" in kolla-ansible as well.
I am getting an issue that when my baremetal node is booted in UEFI Mode, it is not able to find file *"ipxe.efi"* as a result of which cleaning of the node fails
[image: image.png]
But when I change the BIOS Mode of my Baremetal Node to Legacy.BIOS, it looks for the file "*undionly.kpxe"* for which the acknowledgment is received and Data Packets are transferred. Eventually the cleaning of node is also a success.
[image: image.png]
Is there any limitation of IRONIC or KOLLA-ANSIBLE side that provisioning of Baremetal Node can only be done in Legacy Bios Mode? For bare metal provisioning in UEFI mode, is there any other parameter that needs to be enabled.
Regards Anirudh Gupta
On Fri, Aug 20, 2021 at 7:07 AM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Mark,
There was some issue with the cleaning image due to which the issue reported in previous conversation was observed.
This was successfully resolved. By setting the parameter in ironic.conf file [pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi
The "node provide" command successfully executed and the node came in "available" state.
*In Legacy:* When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure is that the node will install userimage over the network and then will be rebooted After the reboot, it will boot up with the Hard disk and with the OS specified in userimage.
*In UEFI:* When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure of installing user image and rebooting remains the same. But After the reboot, despite setting the hard disk as the first priority, it again starts booting over the network and eventually fails.
This is very very likely an issue with the vendor's firmware. We've seen some instances where the bmc refuses to honor the request to change *or* where it is honored for a single boot operation only. In part some of this may be due to improved support in handling UEFI boot signaling where the wrong thing could occur, at least with IPMI.
In order to create a fix or workaround, we need the following information: Are you using IPMI or Redfish? If your using IPMI, you should consider using Redfish. What is the hardware vendor? What is the BMC firmware version? Is the BMC set to always network boot by default completely? In UEFI, what does the machine report for the efibootmgr output. Your deployment agent logs actually have this output already in the journal. Typically /var/log/ironic/deploy_logs. We've seen some hardware act completely disjointed from the EFI NVRAM, or where it resets the EFI NVRAM when we request a one time override. Most importantly, what is the version of ironic and ironic-python-agent?
I have also tried passing the *capabilities='boot_option:local' *both in baremetal node and flavor, but the behaviour is the same.
Regards Anirudh Gupta
[trim]
Hi Julia, Thanks for your reply. There is also an update that with Centos 8.4 Partition Disk Image, I am able to successfully provision the baremetal node. With Centos 8.4 ISO and Wholedisk Image the behaviour is the same that it doesn't boot from Hard disk. Please find below my setup details: I am using *HP server DL380 Gen9* with *BIOS P89 v2.76 (10/21/2019)* with *IPMI* utility Hard disk is the first priority followed by 1GB NIC which I have set to PXE I don't find any logs in /var/log/ironic/deploy_logs. However there is a folder */var/log/kolla/ironic/*, but there are no deploy_logs in that folder I have downloaded the kolla source image from docker hub docker pull kolla/centos-source-ironic-conductor:wallaby Similar images have been downloaded by kolla ansible for other ironic components Regards Anirudh Gupta On Fri, Aug 20, 2021 at 9:56 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:
On Fri, Aug 20, 2021 at 7:07 AM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Mark,
There was some issue with the cleaning image due to which the issue reported in previous conversation was observed.
This was successfully resolved. By setting the parameter in ironic.conf file [pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi
The "node provide" command successfully executed and the node came in "available" state.
*In Legacy:* When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure is that the node will install userimage over the network and then will be rebooted After the reboot, it will boot up with the Hard disk and with the OS specified in userimage.
*In UEFI:* When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure of installing user image and rebooting remains the same. But After the reboot, despite setting the hard disk as the first priority, it again starts booting over the network and eventually fails.
This is very very likely an issue with the vendor's firmware. We've seen some instances where the bmc refuses to honor the request to change *or* where it is honored for a single boot operation only. In part some of this may be due to improved support in handling UEFI boot signaling where the wrong thing could occur, at least with IPMI.
In order to create a fix or workaround, we need the following information:
Are you using IPMI or Redfish? If your using IPMI, you should consider using Redfish.
What is the hardware vendor?
What is the BMC firmware version?
Is the BMC set to always network boot by default completely?
In UEFI, what does the machine report for the efibootmgr output. Your deployment agent logs actually have this output already in the journal. Typically /var/log/ironic/deploy_logs. We've seen some hardware act completely disjointed from the EFI NVRAM, or where it resets the EFI NVRAM when we request a one time override.
Most importantly, what is the version of ironic and ironic-python-agent?
I have also tried passing the *capabilities='boot_option:local' *both in baremetal node and flavor, but the behaviour is the same.
Regards Anirudh Gupta
[trim]
Greetings Anirudh, If you could post your ``openstack baremetal node show <uuid>`` output for a node which is in this state, where it is configured to boot from local storage, and is booting to network. Along with that, it would be helpful to understand if the machine is configured for UEFI or not. Realistically this is where using IPMI on modern hardware becomes a problem, because there is no actual standard for the signaling behavior as it relates to UEFI boot with IPMI. We encourage operators to use Redfish instead as it is clearly delineated as part of the standard. One last thing. You may want to check and update BMC and system firmware on your hardware. On Mon, Aug 23, 2021 at 12:41 AM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Julia,
Thanks for your reply.
There is also an update that with Centos 8.4 Partition Disk Image, I am able to successfully provision the baremetal node. With Centos 8.4 ISO and Wholedisk Image the behaviour is the same that it doesn't boot from Hard disk.
Please find below my setup details:
I am using HP server DL380 Gen9 with BIOS P89 v2.76 (10/21/2019) with IPMI utility
Hard disk is the first priority followed by 1GB NIC which I have set to PXE
I don't find any logs in /var/log/ironic/deploy_logs. However there is a folder /var/log/kolla/ironic/, but there are no deploy_logs in that folder
I have downloaded the kolla source image from docker hub
docker pull kolla/centos-source-ironic-conductor:wallaby
Similar images have been downloaded by kolla ansible for other ironic components
Regards Anirudh Gupta
On Fri, Aug 20, 2021 at 9:56 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:
On Fri, Aug 20, 2021 at 7:07 AM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Mark,
There was some issue with the cleaning image due to which the issue reported in previous conversation was observed.
This was successfully resolved. By setting the parameter in ironic.conf file [pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi
The "node provide" command successfully executed and the node came in "available" state.
In Legacy: When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure is that the node will install userimage over the network and then will be rebooted After the reboot, it will boot up with the Hard disk and with the OS specified in userimage.
In UEFI: When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure of installing user image and rebooting remains the same. But After the reboot, despite setting the hard disk as the first priority, it again starts booting over the network and eventually fails.
This is very very likely an issue with the vendor's firmware. We've seen some instances where the bmc refuses to honor the request to change *or* where it is honored for a single boot operation only. In part some of this may be due to improved support in handling UEFI boot signaling where the wrong thing could occur, at least with IPMI.
In order to create a fix or workaround, we need the following information:
Are you using IPMI or Redfish? If your using IPMI, you should consider using Redfish.
What is the hardware vendor?
What is the BMC firmware version?
Is the BMC set to always network boot by default completely?
In UEFI, what does the machine report for the efibootmgr output. Your deployment agent logs actually have this output already in the journal. Typically /var/log/ironic/deploy_logs. We've seen some hardware act completely disjointed from the EFI NVRAM, or where it resets the EFI NVRAM when we request a one time override.
Most importantly, what is the version of ironic and ironic-python-agent?
I have also tried passing the capabilities='boot_option:local' both in baremetal node and flavor, but the behaviour is the same.
Regards Anirudh Gupta
[trim]
Hi Julia, I have also upgraded my firmware to *P89 v2.90 (10/16/2020) *but still the result is the same. For your reference, the output of is openstack baremetal node show for whole disk image is as follows: [ansible@localhost ~]$ openstack baremetal node show baremetal-node -f json { "allocation_uuid": null, "automated_clean": null, "bios_interface": "no-bios", "boot_interface": "ipxe", "chassis_uuid": null, "clean_step": {}, "conductor": "controller", "conductor_group": "", "console_enabled": false, "console_interface": "no-console", "created_at": "2021-08-25T04:51:32+00:00", "deploy_interface": "direct", "deploy_step": {}, "description": null, "driver": "ipmi", "driver_info": { "ipmi_port": 623, "ipmi_username": "hsc", "ipmi_password": "******", "ipmi_address": "10.0.1.207", "deploy_kernel": "a34b7e57-f324-40fe-8fe4-04eb7ea49c3a", "deploy_ramdisk": "8db38567-4923-4322-b1bf-e12cce5cafc4" }, "driver_internal_info": { "clean_steps": null, "agent_erase_devices_iterations": 1, "agent_erase_devices_zeroize": true, "agent_continue_if_secure_erase_failed": false, "agent_continue_if_ata_erase_failed": false, "agent_enable_nvme_secure_erase": true, "agent_enable_ata_secure_erase": true, "disk_erasure_concurrency": 1, "agent_erase_skip_read_only": false, "last_power_state_change": "2021-08-25T05:10:16.671639", "agent_version": "7.0.2.dev10", "agent_last_heartbeat": "2021-08-25T05:09:34.904605", "hardware_manager_version": { "MellanoxDeviceHardwareManager": "1", "generic_hardware_manager": "1.1" }, "agent_cached_clean_steps_refreshed": "2021-08-25 04:59:28.312524", "is_whole_disk_image": true, "deploy_steps": null, "agent_cached_deploy_steps_refreshed": "2021-08-25 05:08:58.530633", "root_uuid_or_disk_id": "0x3f3df0d8" }, "extra": {}, "fault": null, "inspect_interface": "no-inspect", "inspection_finished_at": null, "inspection_started_at": null, "instance_info": { "image_source": "da92cd5d-e1d6-458d-a2b2-86e897a982c6", "root_gb": "470", "swap_mb": "0", "display_name": "server1", "vcpus": "24", "nova_host_id": "controller-ironic", "memory_mb": "62700", "local_gb": "470", "configdrive": "******", "image_disk_format": "raw", "image_checksum": null, "image_os_hash_algo": "sha512", "image_os_hash_value": "3b16d3a6734c23fb43fbd6deee16c907ea8e398bfd5163cd08f16ccd07a74399bb35f16a4713c3847058b445bf4150448f22eb11e75debcc548b8eaacf777e70", "image_url": "******", "image_container_format": "bare", "image_tags": [], "image_properties": { "stores": "file", "os_hidden": false, "virtual_size": 3511681024, "owner_specified.openstack.object": "images/centos-d", "owner_specified.openstack.sha256": "", "owner_specified.openstack.md5": "" }, "image_type": "whole-disk-image" }, "instance_uuid": "e29c267f-8ddb-4dce-a07c-18c4f7210010", "last_error": null, "lessee": null, "maintenance": false, "maintenance_reason": null, "management_interface": "ipmitool", "name": "baremetal-node", "network_data": {}, "network_interface": "flat", "owner": null, "power_interface": "ipmitool", "power_state": "power on", "properties": { "cpus": 30, "memory_mb": 62700, "local_gb": 470, "cpu_arch": "x86_64", "capabilities": "boot_mode:uefi,boot_option:local", "vendor": "hewlett-packard" }, "protected": false, "protected_reason": null, "provision_state": "active", "provision_updated_at": "2021-08-25T05:10:37+00:00", "raid_config": {}, "raid_interface": "no-raid", "rescue_interface": "no-rescue", "reservation": null, "resource_class": "baremetal-resource-class", "retired": false, "retired_reason": null, "storage_interface": "noop", "target_power_state": null, "target_provision_state": null, "target_raid_config": {}, "traits": [], "updated_at": "2021-08-25T05:10:37+00:00", "uuid": "3caaffe3-a6be-4b8c-b3dd-d302c4367670", "vendor_interface": "ipmitool" } I am not getting why this issue is not being reproduced with the partition disk image. Regards Anirudh Gupta On Mon, Aug 23, 2021 at 7:11 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:
Greetings Anirudh,
If you could post your ``openstack baremetal node show <uuid>`` output for a node which is in this state, where it is configured to boot from local storage, and is booting to network. Along with that, it would be helpful to understand if the machine is configured for UEFI or not. Realistically this is where using IPMI on modern hardware becomes a problem, because there is no actual standard for the signaling behavior as it relates to UEFI boot with IPMI. We encourage operators to use Redfish instead as it is clearly delineated as part of the standard.
One last thing. You may want to check and update BMC and system firmware on your hardware.
On Mon, Aug 23, 2021 at 12:41 AM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Julia,
Thanks for your reply.
There is also an update that with Centos 8.4 Partition Disk Image, I am
able to successfully provision the baremetal node. With Centos 8.4 ISO and Wholedisk Image the behaviour is the same that it doesn't boot from Hard disk.
Please find below my setup details:
I am using HP server DL380 Gen9 with BIOS P89 v2.76 (10/21/2019) with
IPMI utility
Hard disk is the first priority followed by 1GB NIC which I have set to
PXE
I don't find any logs in /var/log/ironic/deploy_logs. However there is a
folder /var/log/kolla/ironic/, but there are no deploy_logs in that folder
I have downloaded the kolla source image from docker hub
docker pull kolla/centos-source-ironic-conductor:wallaby
Similar images have been downloaded by kolla ansible for other ironic
components
Regards Anirudh Gupta
On Fri, Aug 20, 2021 at 9:56 PM Julia Kreger <
On Fri, Aug 20, 2021 at 7:07 AM Anirudh Gupta <anyrude10@gmail.com>
wrote:
Hi Mark,
There was some issue with the cleaning image due to which the issue
reported in previous conversation was observed.
This was successfully resolved. By setting the parameter in ironic.conf file [pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi
The "node provide" command successfully executed and the node came in
"available" state.
In Legacy: When I am trying to create the server using "server create " command
and a userimage is passed in the command, the procedure is that the node will install userimage over the network and then will be rebooted
After the reboot, it will boot up with the Hard disk and with the OS specified in userimage.
In UEFI: When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure of installing user image and rebooting remains the same. But After the reboot, despite setting the hard disk as the first
This is very very likely an issue with the vendor's firmware. We've seen some instances where the bmc refuses to honor the request to change *or* where it is honored for a single boot operation only. In part some of
juliaashleykreger@gmail.com> wrote: priority, it again starts booting over the network and eventually fails. this may be due to improved support in handling UEFI boot signaling where the wrong thing could occur, at least with IPMI.
In order to create a fix or workaround, we need the following
information:
Are you using IPMI or Redfish? If your using IPMI, you should consider
using Redfish.
What is the hardware vendor?
What is the BMC firmware version?
Is the BMC set to always network boot by default completely?
In UEFI, what does the machine report for the efibootmgr output. Your
deployment agent logs actually have this output already in the journal. Typically /var/log/ironic/deploy_logs. We've seen some hardware act completely disjointed from the EFI NVRAM, or where it resets the EFI NVRAM when we request a one time override.
Most importantly, what is the version of ironic and ironic-python-agent?
I have also tried passing the capabilities='boot_option:local' both in
baremetal node and flavor, but the behaviour is the same.
Regards Anirudh Gupta
[trim]
Looking at the output, it looks correct. Except for the resource_class setting, but that has *nothing* to do with bootloading. :) I think we're going to need a log from the agent. Check the ``[agent]deploy_logs_collect``, ``[agent]deploy_logs_storage_backend``, and ``[agent]deploy_logs_local_path`` settings. Ideally you would be storing the deployment logs locally and you would be able to identify the instance from this node. One possibility, ironic could be doing the right thing,, but your disk image may not be able to be read/loaded by the firmware. For example, if a partition image is used, we create GPT partitioning, and try to find/extract the bootloader from within the image.. IF a whole disk image is used, we just write out what was supplied, and then attempt to look for the UEFI boot loader in the correct location, and then configure the system to boot using it. That is a bit of a stretch, but something you'll need to check because it may be that the firmware doesn't understand the partition type. Another possibility is that the UEFI specific overrides are invalid on ilo hardware which signals the machine to try to UEFI boot. This would be the first case we've heard of this. *generally* HPE recommends individuals/operators use their ilo driver for the best experience. An alternative is the redfish driver. Ilo is the known tested driver for HPE hardware, and redfish has UEFI boot settings encoded as part of the standard, where as IPMI was last revised in the 1990s. -Julia On Tue, Aug 24, 2021 at 10:27 PM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Julia,
I have also upgraded my firmware to P89 v2.90 (10/16/2020) but still the result is the same. For your reference, the output of is openstack baremetal node show for whole disk image is as follows:
[ansible@localhost ~]$ openstack baremetal node show baremetal-node -f json { "allocation_uuid": null, "automated_clean": null, "bios_interface": "no-bios", "boot_interface": "ipxe", "chassis_uuid": null, "clean_step": {}, "conductor": "controller", "conductor_group": "", "console_enabled": false, "console_interface": "no-console", "created_at": "2021-08-25T04:51:32+00:00", "deploy_interface": "direct", "deploy_step": {}, "description": null, "driver": "ipmi", "driver_info": { "ipmi_port": 623, "ipmi_username": "hsc", "ipmi_password": "******", "ipmi_address": "10.0.1.207", "deploy_kernel": "a34b7e57-f324-40fe-8fe4-04eb7ea49c3a", "deploy_ramdisk": "8db38567-4923-4322-b1bf-e12cce5cafc4" }, "driver_internal_info": { "clean_steps": null, "agent_erase_devices_iterations": 1, "agent_erase_devices_zeroize": true, "agent_continue_if_secure_erase_failed": false, "agent_continue_if_ata_erase_failed": false, "agent_enable_nvme_secure_erase": true, "agent_enable_ata_secure_erase": true, "disk_erasure_concurrency": 1, "agent_erase_skip_read_only": false, "last_power_state_change": "2021-08-25T05:10:16.671639", "agent_version": "7.0.2.dev10", "agent_last_heartbeat": "2021-08-25T05:09:34.904605", "hardware_manager_version": { "MellanoxDeviceHardwareManager": "1", "generic_hardware_manager": "1.1" }, "agent_cached_clean_steps_refreshed": "2021-08-25 04:59:28.312524", "is_whole_disk_image": true, "deploy_steps": null, "agent_cached_deploy_steps_refreshed": "2021-08-25 05:08:58.530633", "root_uuid_or_disk_id": "0x3f3df0d8" }, "extra": {}, "fault": null, "inspect_interface": "no-inspect", "inspection_finished_at": null, "inspection_started_at": null, "instance_info": { "image_source": "da92cd5d-e1d6-458d-a2b2-86e897a982c6", "root_gb": "470", "swap_mb": "0", "display_name": "server1", "vcpus": "24", "nova_host_id": "controller-ironic", "memory_mb": "62700", "local_gb": "470", "configdrive": "******", "image_disk_format": "raw", "image_checksum": null, "image_os_hash_algo": "sha512", "image_os_hash_value": "3b16d3a6734c23fb43fbd6deee16c907ea8e398bfd5163cd08f16ccd07a74399bb35f16a4713c3847058b445bf4150448f22eb11e75debcc548b8eaacf777e70", "image_url": "******", "image_container_format": "bare", "image_tags": [], "image_properties": { "stores": "file", "os_hidden": false, "virtual_size": 3511681024, "owner_specified.openstack.object": "images/centos-d", "owner_specified.openstack.sha256": "", "owner_specified.openstack.md5": "" }, "image_type": "whole-disk-image" }, "instance_uuid": "e29c267f-8ddb-4dce-a07c-18c4f7210010", "last_error": null, "lessee": null, "maintenance": false, "maintenance_reason": null, "management_interface": "ipmitool", "name": "baremetal-node", "network_data": {}, "network_interface": "flat", "owner": null, "power_interface": "ipmitool", "power_state": "power on", "properties": { "cpus": 30, "memory_mb": 62700, "local_gb": 470, "cpu_arch": "x86_64", "capabilities": "boot_mode:uefi,boot_option:local", "vendor": "hewlett-packard" }, "protected": false, "protected_reason": null, "provision_state": "active", "provision_updated_at": "2021-08-25T05:10:37+00:00", "raid_config": {}, "raid_interface": "no-raid", "rescue_interface": "no-rescue", "reservation": null, "resource_class": "baremetal-resource-class", "retired": false, "retired_reason": null, "storage_interface": "noop", "target_power_state": null, "target_provision_state": null, "target_raid_config": {}, "traits": [], "updated_at": "2021-08-25T05:10:37+00:00", "uuid": "3caaffe3-a6be-4b8c-b3dd-d302c4367670", "vendor_interface": "ipmitool" }
I am not getting why this issue is not being reproduced with the partition disk image.
Regards Anirudh Gupta
On Mon, Aug 23, 2021 at 7:11 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:
Greetings Anirudh,
If you could post your ``openstack baremetal node show <uuid>`` output for a node which is in this state, where it is configured to boot from local storage, and is booting to network. Along with that, it would be helpful to understand if the machine is configured for UEFI or not. Realistically this is where using IPMI on modern hardware becomes a problem, because there is no actual standard for the signaling behavior as it relates to UEFI boot with IPMI. We encourage operators to use Redfish instead as it is clearly delineated as part of the standard.
One last thing. You may want to check and update BMC and system firmware on your hardware.
On Mon, Aug 23, 2021 at 12:41 AM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Julia,
Thanks for your reply.
There is also an update that with Centos 8.4 Partition Disk Image, I am able to successfully provision the baremetal node. With Centos 8.4 ISO and Wholedisk Image the behaviour is the same that it doesn't boot from Hard disk.
Please find below my setup details:
I am using HP server DL380 Gen9 with BIOS P89 v2.76 (10/21/2019) with IPMI utility
Hard disk is the first priority followed by 1GB NIC which I have set to PXE
I don't find any logs in /var/log/ironic/deploy_logs. However there is a folder /var/log/kolla/ironic/, but there are no deploy_logs in that folder
I have downloaded the kolla source image from docker hub
docker pull kolla/centos-source-ironic-conductor:wallaby
Similar images have been downloaded by kolla ansible for other ironic components
Regards Anirudh Gupta
On Fri, Aug 20, 2021 at 9:56 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:
On Fri, Aug 20, 2021 at 7:07 AM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Mark,
There was some issue with the cleaning image due to which the issue reported in previous conversation was observed.
This was successfully resolved. By setting the parameter in ironic.conf file [pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi
The "node provide" command successfully executed and the node came in "available" state.
In Legacy: When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure is that the node will install userimage over the network and then will be rebooted After the reboot, it will boot up with the Hard disk and with the OS specified in userimage.
In UEFI: When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure of installing user image and rebooting remains the same. But After the reboot, despite setting the hard disk as the first priority, it again starts booting over the network and eventually fails.
This is very very likely an issue with the vendor's firmware. We've seen some instances where the bmc refuses to honor the request to change *or* where it is honored for a single boot operation only. In part some of this may be due to improved support in handling UEFI boot signaling where the wrong thing could occur, at least with IPMI.
In order to create a fix or workaround, we need the following information:
Are you using IPMI or Redfish? If your using IPMI, you should consider using Redfish.
What is the hardware vendor?
What is the BMC firmware version?
Is the BMC set to always network boot by default completely?
In UEFI, what does the machine report for the efibootmgr output. Your deployment agent logs actually have this output already in the journal. Typically /var/log/ironic/deploy_logs. We've seen some hardware act completely disjointed from the EFI NVRAM, or where it resets the EFI NVRAM when we request a one time override.
Most importantly, what is the version of ironic and ironic-python-agent?
I have also tried passing the capabilities='boot_option:local' both in baremetal node and flavor, but the behaviour is the same.
Regards Anirudh Gupta
[trim]
Dear Anirudh, I do remember I also had some odd behaviour with HPE ProLiant DL380 Gen10 server when using UEFI boot. I was using bifrost to deploy the server with Kayobe (before deploying the cloud services). I could overcome the issue by adding capability "boot_mode:uefi" to the node. So my complete capability string was "cpu_vt:true,cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true,cpu_txt:true,boot_option:local,boot_mode:uefi" Using guide Ironic Advanced Features Guide [1]. Hope this helps [1] https://docs.openstack.org/ironic/wallaby/install/advanced.html On Mon, 23 Aug 2021 at 20:24, Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Julia,
Thanks for your reply.
There is also an update that with Centos 8.4 Partition Disk Image, I am able to successfully provision the baremetal node. With Centos 8.4 ISO and Wholedisk Image the behaviour is the same that it doesn't boot from Hard disk.
Please find below my setup details:
I am using *HP server DL380 Gen9* with *BIOS P89 v2.76 (10/21/2019)* with *IPMI* utility
Hard disk is the first priority followed by 1GB NIC which I have set to PXE
I don't find any logs in /var/log/ironic/deploy_logs. However there is a folder */var/log/kolla/ironic/*, but there are no deploy_logs in that folder
I have downloaded the kolla source image from docker hub
docker pull kolla/centos-source-ironic-conductor:wallaby
Similar images have been downloaded by kolla ansible for other ironic components
Regards Anirudh Gupta
On Fri, Aug 20, 2021 at 9:56 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:
On Fri, Aug 20, 2021 at 7:07 AM Anirudh Gupta <anyrude10@gmail.com> wrote:
Hi Mark,
There was some issue with the cleaning image due to which the issue reported in previous conversation was observed.
This was successfully resolved. By setting the parameter in ironic.conf file [pxe] uefi_ipxe_bootfile_name = ipxe-x86_64.efi
The "node provide" command successfully executed and the node came in "available" state.
*In Legacy:* When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure is that the node will install userimage over the network and then will be rebooted After the reboot, it will boot up with the Hard disk and with the OS specified in userimage.
*In UEFI:* When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure of installing user image and rebooting remains the same. But After the reboot, despite setting the hard disk as the first priority, it again starts booting over the network and eventually fails.
This is very very likely an issue with the vendor's firmware. We've seen some instances where the bmc refuses to honor the request to change *or* where it is honored for a single boot operation only. In part some of this may be due to improved support in handling UEFI boot signaling where the wrong thing could occur, at least with IPMI.
In order to create a fix or workaround, we need the following information:
Are you using IPMI or Redfish? If your using IPMI, you should consider using Redfish.
What is the hardware vendor?
What is the BMC firmware version?
Is the BMC set to always network boot by default completely?
In UEFI, what does the machine report for the efibootmgr output. Your deployment agent logs actually have this output already in the journal. Typically /var/log/ironic/deploy_logs. We've seen some hardware act completely disjointed from the EFI NVRAM, or where it resets the EFI NVRAM when we request a one time override.
Most importantly, what is the version of ironic and ironic-python-agent?
I have also tried passing the *capabilities='boot_option:local' *both in baremetal node and flavor, but the behaviour is the same.
Regards Anirudh Gupta
[trim]
-- බුද්ධික සංජීව ගොඩාකුරු Buddhika Sanjeewa Godakuru Systems Analyst/Programmer Deputy Webmaster / University of Kelaniya Information and Communication Technology Centre (ICTC) University of Kelaniya, Sri Lanka, Kelaniya, Sri Lanka. Mobile : (+94) 071 5696981 Office : (+94) 011 2903420 / 2903424 -- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ University of Kelaniya Sri Lanka, accepts no liability for the content of this email, or for the consequences of any actions taken on the basis of the information provided, unless that information is subsequently confirmed in writing. If you are not the intended recipient, this email and/or any information it contains should not be copied, disclosed, retained or used by you or any other party and the email and all its contents should be promptly deleted fully from our system and the sender informed. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
participants (4)
-
Anirudh Gupta
-
Buddhika Godakuru
-
Julia Kreger
-
Mark Goddard