[Kolla-Ansible][Ironic] Baremetal node Cleaning fails in UEFI mode, but succeeds in Legacy Bios Mode

Julia Kreger juliaashleykreger at gmail.com
Wed Aug 25 13:30:41 UTC 2021


Looking at the output, it looks correct. Except for the resource_class
setting, but that has *nothing* to do with bootloading. :)

I think we're going to need a log from the agent. Check the
``[agent]deploy_logs_collect``,
``[agent]deploy_logs_storage_backend``, and
``[agent]deploy_logs_local_path`` settings. Ideally you would be
storing the deployment logs locally and you would be able to identify
the instance from this node.

One possibility, ironic could be doing the right thing,, but your disk
image may not be able to be read/loaded by the firmware.

For example, if a partition image is used, we create GPT partitioning,
and try to find/extract the bootloader from within the image.. IF a
whole disk image is used, we just write out what was supplied, and
then attempt to look for the UEFI boot loader in the correct location,
and then configure the system to boot using it.

That is a bit of a stretch, but something you'll need to check because
it may be that the firmware doesn't understand the partition type.

Another possibility is that the UEFI specific overrides are invalid on
ilo hardware which signals the machine to try to UEFI boot.  This
would be the first case we've heard of this. *generally* HPE
recommends individuals/operators use their ilo driver for the best
experience. An alternative is the redfish driver.

Ilo is the known tested driver for HPE hardware, and redfish has UEFI
boot settings encoded as part of the standard, where as IPMI was last
revised in the 1990s.

-Julia

On Tue, Aug 24, 2021 at 10:27 PM Anirudh Gupta <anyrude10 at gmail.com> wrote:
>
> Hi Julia,
>
> I have also upgraded my firmware to P89 v2.90 (10/16/2020) but still the result is the same.
> For your reference, the output of is openstack baremetal node show for whole disk image is as follows:
>
> [ansible at localhost ~]$ openstack baremetal node show baremetal-node -f json
> {
>   "allocation_uuid": null,
>   "automated_clean": null,
>   "bios_interface": "no-bios",
>   "boot_interface": "ipxe",
>   "chassis_uuid": null,
>   "clean_step": {},
>   "conductor": "controller",
>   "conductor_group": "",
>   "console_enabled": false,
>   "console_interface": "no-console",
>   "created_at": "2021-08-25T04:51:32+00:00",
>   "deploy_interface": "direct",
>   "deploy_step": {},
>   "description": null,
>   "driver": "ipmi",
>   "driver_info": {
>     "ipmi_port": 623,
>     "ipmi_username": "hsc",
>     "ipmi_password": "******",
>     "ipmi_address": "10.0.1.207",
>     "deploy_kernel": "a34b7e57-f324-40fe-8fe4-04eb7ea49c3a",
>     "deploy_ramdisk": "8db38567-4923-4322-b1bf-e12cce5cafc4"
>   },
>   "driver_internal_info": {
>     "clean_steps": null,
>     "agent_erase_devices_iterations": 1,
>     "agent_erase_devices_zeroize": true,
>     "agent_continue_if_secure_erase_failed": false,
>     "agent_continue_if_ata_erase_failed": false,
>     "agent_enable_nvme_secure_erase": true,
>     "agent_enable_ata_secure_erase": true,
>     "disk_erasure_concurrency": 1,
>     "agent_erase_skip_read_only": false,
>     "last_power_state_change": "2021-08-25T05:10:16.671639",
>     "agent_version": "7.0.2.dev10",
>     "agent_last_heartbeat": "2021-08-25T05:09:34.904605",
>     "hardware_manager_version": {
>       "MellanoxDeviceHardwareManager": "1",
>       "generic_hardware_manager": "1.1"
>     },
>     "agent_cached_clean_steps_refreshed": "2021-08-25 04:59:28.312524",
>     "is_whole_disk_image": true,
>     "deploy_steps": null,
>     "agent_cached_deploy_steps_refreshed": "2021-08-25 05:08:58.530633",
>     "root_uuid_or_disk_id": "0x3f3df0d8"
>   },
>   "extra": {},
>   "fault": null,
>   "inspect_interface": "no-inspect",
>   "inspection_finished_at": null,
>   "inspection_started_at": null,
>   "instance_info": {
>     "image_source": "da92cd5d-e1d6-458d-a2b2-86e897a982c6",
>     "root_gb": "470",
>     "swap_mb": "0",
>     "display_name": "server1",
>     "vcpus": "24",
>     "nova_host_id": "controller-ironic",
>     "memory_mb": "62700",
>     "local_gb": "470",
>     "configdrive": "******",
>     "image_disk_format": "raw",
>     "image_checksum": null,
>     "image_os_hash_algo": "sha512",
>     "image_os_hash_value": "3b16d3a6734c23fb43fbd6deee16c907ea8e398bfd5163cd08f16ccd07a74399bb35f16a4713c3847058b445bf4150448f22eb11e75debcc548b8eaacf777e70",
>     "image_url": "******",
>     "image_container_format": "bare",
>     "image_tags": [],
>     "image_properties": {
>       "stores": "file",
>       "os_hidden": false,
>       "virtual_size": 3511681024,
>       "owner_specified.openstack.object": "images/centos-d",
>       "owner_specified.openstack.sha256": "",
>       "owner_specified.openstack.md5": ""
>     },
>     "image_type": "whole-disk-image"
>   },
>   "instance_uuid": "e29c267f-8ddb-4dce-a07c-18c4f7210010",
>   "last_error": null,
>   "lessee": null,
>   "maintenance": false,
>   "maintenance_reason": null,
>   "management_interface": "ipmitool",
>   "name": "baremetal-node",
>   "network_data": {},
>   "network_interface": "flat",
>   "owner": null,
>   "power_interface": "ipmitool",
>   "power_state": "power on",
>   "properties": {
>     "cpus": 30,
>     "memory_mb": 62700,
>     "local_gb": 470,
>     "cpu_arch": "x86_64",
>     "capabilities": "boot_mode:uefi,boot_option:local",
>     "vendor": "hewlett-packard"
>   },
>   "protected": false,
>   "protected_reason": null,
>   "provision_state": "active",
>   "provision_updated_at": "2021-08-25T05:10:37+00:00",
>   "raid_config": {},
>   "raid_interface": "no-raid",
>   "rescue_interface": "no-rescue",
>   "reservation": null,
>   "resource_class": "baremetal-resource-class",
>   "retired": false,
>   "retired_reason": null,
>   "storage_interface": "noop",
>   "target_power_state": null,
>   "target_provision_state": null,
>   "target_raid_config": {},
>   "traits": [],
>   "updated_at": "2021-08-25T05:10:37+00:00",
>   "uuid": "3caaffe3-a6be-4b8c-b3dd-d302c4367670",
>   "vendor_interface": "ipmitool"
> }
>
>
> I am not getting why this issue is not being reproduced with the partition disk image.
>
> Regards
> Anirudh Gupta
>
>
>
> On Mon, Aug 23, 2021 at 7:11 PM Julia Kreger <juliaashleykreger at gmail.com> wrote:
>>
>> Greetings Anirudh,
>>
>> If you could post your ``openstack baremetal node show <uuid>`` output
>> for a node which is in this state, where it is configured to boot from
>> local storage, and is booting to network. Along with that, it would be
>> helpful to understand if the machine is configured for UEFI or not.
>> Realistically this is where using IPMI on modern hardware becomes a
>> problem, because there is no actual standard for the signaling
>> behavior as it relates to UEFI boot with IPMI. We encourage operators
>> to use Redfish instead as it is clearly delineated as part of the
>> standard.
>>
>> One last thing. You may want to check and update BMC and system
>> firmware on your hardware.
>>
>> On Mon, Aug 23, 2021 at 12:41 AM Anirudh Gupta <anyrude10 at gmail.com> wrote:
>> >
>> > Hi Julia,
>> >
>> > Thanks for your reply.
>> >
>> > There is also an update that with Centos 8.4 Partition Disk Image, I am able to successfully provision the baremetal node. With Centos 8.4 ISO and Wholedisk Image the behaviour is the same that it doesn't boot from Hard disk.
>> >
>> > Please find below my setup details:
>> >
>> > I am using HP server DL380 Gen9 with BIOS P89 v2.76 (10/21/2019) with IPMI utility
>> >
>> > Hard disk is the first priority followed by 1GB NIC which I have set to PXE
>> >
>> > I don't find any logs in /var/log/ironic/deploy_logs. However there is a folder /var/log/kolla/ironic/, but there are no deploy_logs in that folder
>> >
>> > I have downloaded the kolla source image from docker hub
>> >
>> > docker pull kolla/centos-source-ironic-conductor:wallaby
>> >
>> > Similar images have been downloaded by kolla ansible for other ironic components
>> >
>> > Regards
>> > Anirudh Gupta
>> >
>> > On Fri, Aug 20, 2021 at 9:56 PM Julia Kreger <juliaashleykreger at gmail.com> wrote:
>> >>
>> >>
>> >>
>> >> On Fri, Aug 20, 2021 at 7:07 AM Anirudh Gupta <anyrude10 at gmail.com> wrote:
>> >>>
>> >>> Hi Mark,
>> >>>
>> >>> There was some issue with the cleaning image due to which the issue reported in previous conversation was observed.
>> >>>
>> >>> This was successfully resolved.
>> >>> By setting the parameter in ironic.conf file
>> >>> [pxe]
>> >>> uefi_ipxe_bootfile_name =  ipxe-x86_64.efi
>> >>>
>> >>> The "node provide" command successfully executed and the node came in "available" state.
>> >>>
>> >>> In Legacy:
>> >>> When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure is that the node will install userimage over the network and then will be rebooted
>> >>> After the reboot, it will boot up with the Hard disk and with the OS specified in userimage.
>> >>>
>> >>> In UEFI:
>> >>> When I am trying to create the server using "server create " command and a userimage is passed in the command, the procedure of installing user image and rebooting remains the same.
>> >>> But After the reboot, despite setting the hard disk as the first priority,  it again starts booting over the network and eventually fails.
>> >>>
>> >> This is very very likely an issue with the vendor's firmware. We've seen some instances where the bmc refuses to honor the request to change *or*  where it is honored for a single boot operation only. In part some of this may be due to improved support in handling UEFI boot signaling where the wrong thing could occur, at least with IPMI.
>> >>
>> >> In order to create a fix or workaround, we need the following information:
>> >>
>> >> Are you using IPMI or Redfish? If your using IPMI, you should consider using Redfish.
>> >>
>> >> What is the hardware vendor?
>> >>
>> >> What is the BMC firmware version?
>> >>
>> >> Is the BMC set to always network boot by default completely?
>> >>
>> >> In UEFI, what does the machine report for the efibootmgr output. Your deployment agent logs actually have this output already in the journal. Typically /var/log/ironic/deploy_logs. We've seen some hardware act completely disjointed from the EFI NVRAM, or where it resets the EFI NVRAM when we request a one time override.
>> >>
>> >> Most importantly, what is the version of ironic and ironic-python-agent?
>> >>
>> >>>
>> >>> I have also tried passing the capabilities='boot_option:local' both in baremetal node and flavor, but the behaviour is the same.
>> >>>
>> >>> Regards
>> >>> Anirudh Gupta
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> [trim]



More information about the openstack-discuss mailing list