[ironic] Deploy images crashing Dell server BIOS using UEFI boot

Mike Currin mike at idia.ac.za
Thu May 25 07:16:50 UTC 2023


Hi All,

We have a Xena based Openstack deployment, recently we deployed 60+
nodes in our research cluster with Ironic which worked well.  All of
these were deployed using a standard process I'll describe below.

We recently took delivery of a new Dell R6625 server with NVMe devices
onlym which only support UEFI boot, so we are trying to get that
working.

The server PXEs and downloads the RAM disk and then the Deploy image,
once running that it immediately crashes (I assume when running
linuxefi).  We tested UEFI deploy on an existing Dell R640 server,
that server works with BIOS but we swapped it over to UEFI and it does
the same, so it wasn't due to the much bigger/different architecture
(AMD vs Intel) server.  We have a few older servers in a test setup
(which are Dell R630's) which are working fine and don't do this
behaviour.  We haven't tried them on our production setup as if even
if they worked it wouldn't help us move forward.

I made a video showing this:
https://www.dropbox.com/s/5jbn1qpylxaevqb/uefiboot2.mov?dl=0
In the iDRAC we just get that the "System BIOS has halted" and
somewhere I said to change hardware that you recently added, which
feels unlikely as 2 different servers both working elsewhere with
totally different hardware,.

I've done a iDRAC Serial console debug but it isn't showing me much
that is of any use.

This is our entire process to deploy a node (some is once off of
course, I've not included the network setup):

openstack flavor create --ram 256000 --disk 20 --vcpus 32 --public our-baremetal
openstack flavor set our-baremetal --property capabilities:boot_mode="uefi"

We downloaded the latest (Xena) images from:
https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/
I also tried the latest Centos9 ones just to try them, made no difference.

We then extract the image and make a small mod to the service start
(we found it didn't bring the NIC immediately up so put a ping delay
in the ExecStart), but that's not part of this problem.

openstack image create --disk-format aki --container-format aki
--public --file ironic-agent.kernel deploy-vmlinuz
openstack image create --disk-format ari --container-format ari
--public --file ironic-agent.initramfs-ping-patched deploy-initrd

Then to do the deploy:
export HOSTNAME=<hostname>
export MGMTIP=<idracip>

openstack baremetal node create --driver ipmi --name $HOSTNAME
--driver-info ipmi_port=623 --driver-info ipmi_username=root
--driver-info 'ipmi_password=<ourpassword>' --driver-info
ipmi_address=$MGMTIP --resource-class baremetal-resource-class
--property cpus=32 --property memory_mb=256000 --property local_gb=20
--property cpu_arch=x86_64 --driver-info deploy_ramdisk=$(openstack
image show deploy-initrd -f value -c id) --driver-info
deploy_kernel=$(openstack image show deploy-vmlinuz -f value -c id)
NODE=$(openstack baremetal node show -f value -c uuid $HOSTNAME)
openstack baremetal node set $NODE --property capabilities='boot_mode:uefi'

openstack baremetal port create <MACADDRESS> --node $NODE
--physical-network physnet3
openstack baremetal node manage $NODE --wait && openstack baremetal
node list && openstack baremetal node provide $NODE && openstack
baremetal node list

openstack server create --use-config-drive --image <ourimage> --flavor
our-baremetal --security-group worker --network ironic-network
--key-name <ourkeyname> servername

Does any one have any more info to help or any suggestions as to
something more I could try, I'm out of ideas.  I know that UEFI itself
works on both the servers, we have a setup with Ubuntu MAAS and it can
deploy perfectly fine using its process with the UEFI setup so, it's
something on the Ironic deploy image that's causing us this problem.

Regards,
Mike



More information about the openstack-discuss mailing list