[ironic] Deploy images crashing Dell server BIOS using UEFI boot

Iury Gregory iurygregory at gmail.com
Thu May 25 12:25:10 UTC 2023


Hello Mike,

We had this issue in the past and there was a bug tracker upstream [1], the
upstream fix was merged during xena cycle [2].
I would attempt to check the firmware present in the machine and try to
upgrade to see if it helps.
I've also noticed we just had another change in IPA with a fix for
efibootmgr [3], not sure if the image you downloaded from tarballs already
contains.

[1] https://storyboard.openstack.org/#!/story/2008962
[2] https://review.opendev.org/c/openstack/ironic-python-agent/+/795862
[3] https://review.opendev.org/c/openstack/ironic-python-agent/+/881762

Em qui., 25 de mai. de 2023 às 04:19, Mike Currin <mike at idia.ac.za>
escreveu:

> Hi All,
>
> We have a Xena based Openstack deployment, recently we deployed 60+
> nodes in our research cluster with Ironic which worked well.  All of
> these were deployed using a standard process I'll describe below.
>
> We recently took delivery of a new Dell R6625 server with NVMe devices
> onlym which only support UEFI boot, so we are trying to get that
> working.
>
> The server PXEs and downloads the RAM disk and then the Deploy image,
> once running that it immediately crashes (I assume when running
> linuxefi).  We tested UEFI deploy on an existing Dell R640 server,
> that server works with BIOS but we swapped it over to UEFI and it does
> the same, so it wasn't due to the much bigger/different architecture
> (AMD vs Intel) server.  We have a few older servers in a test setup
> (which are Dell R630's) which are working fine and don't do this
> behaviour.  We haven't tried them on our production setup as if even
> if they worked it wouldn't help us move forward.
>
> I made a video showing this:
> https://www.dropbox.com/s/5jbn1qpylxaevqb/uefiboot2.mov?dl=0
> In the iDRAC we just get that the "System BIOS has halted" and
> somewhere I said to change hardware that you recently added, which
> feels unlikely as 2 different servers both working elsewhere with
> totally different hardware,.
>
> I've done a iDRAC Serial console debug but it isn't showing me much
> that is of any use.
>
> This is our entire process to deploy a node (some is once off of
> course, I've not included the network setup):
>
> openstack flavor create --ram 256000 --disk 20 --vcpus 32 --public
> our-baremetal
> openstack flavor set our-baremetal --property capabilities:boot_mode="uefi"
>
> We downloaded the latest (Xena) images from:
> https://tarballs.opendev.org/openstack/ironic-python-agent/dib/files/
> I also tried the latest Centos9 ones just to try them, made no difference.
>
> We then extract the image and make a small mod to the service start
> (we found it didn't bring the NIC immediately up so put a ping delay
> in the ExecStart), but that's not part of this problem.
>
> openstack image create --disk-format aki --container-format aki
> --public --file ironic-agent.kernel deploy-vmlinuz
> openstack image create --disk-format ari --container-format ari
> --public --file ironic-agent.initramfs-ping-patched deploy-initrd
>
> Then to do the deploy:
> export HOSTNAME=<hostname>
> export MGMTIP=<idracip>
>
> openstack baremetal node create --driver ipmi --name $HOSTNAME
> --driver-info ipmi_port=623 --driver-info ipmi_username=root
> --driver-info 'ipmi_password=<ourpassword>' --driver-info
> ipmi_address=$MGMTIP --resource-class baremetal-resource-class
> --property cpus=32 --property memory_mb=256000 --property local_gb=20
> --property cpu_arch=x86_64 --driver-info deploy_ramdisk=$(openstack
> image show deploy-initrd -f value -c id) --driver-info
> deploy_kernel=$(openstack image show deploy-vmlinuz -f value -c id)
> NODE=$(openstack baremetal node show -f value -c uuid $HOSTNAME)
> openstack baremetal node set $NODE --property capabilities='boot_mode:uefi'
>
> openstack baremetal port create <MACADDRESS> --node $NODE
> --physical-network physnet3
> openstack baremetal node manage $NODE --wait && openstack baremetal
> node list && openstack baremetal node provide $NODE && openstack
> baremetal node list
>
> openstack server create --use-config-drive --image <ourimage> --flavor
> our-baremetal --security-group worker --network ironic-network
> --key-name <ourkeyname> servername
>
> Does any one have any more info to help or any suggestions as to
> something more I could try, I'm out of ideas.  I know that UEFI itself
> works on both the servers, we have a setup with Ubuntu MAAS and it can
> deploy perfectly fine using its process with the UEFI setup so, it's
> something on the Ironic deploy image that's causing us this problem.
>
> Regards,
> Mike
>
>

-- 
*Att[]'s*

*Iury Gregory Melo Ferreira *
*MSc in Computer Science at UFCG*
*Ironic PTL *
*Senior Software Engineer at Red Hat Brazil*
*Social*: https://www.linkedin.com/in/iurygregory
*E-mail:  iurygregory at gmail.com <iurygregory at gmail.com>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230525/cbbc8ab0/attachment.htm>


More information about the openstack-discuss mailing list