Hi Julia,

Thanks for the feedback. I was able to run the deployment and attempted to start booting my ISO images onto the bare metal node. However, it seems to fail when writing the image to the device file (/dev/sda) when I look into my ironic-conductor.log. Apart from that, I am also getting error messages like "Invalid or missing agent_token received for node" and "An agent token generation request is being refused as one is already present." From this point onwards, I think the PXE boot is working since it can successfully download the NBP file.

Best,
James

On Fri, May 31, 2024 at 2:45 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:


On Thu, May 30, 2024 at 12:02 PM James Leong <jamesleong123098@gmail.com> wrote:
Hi Jay,

Thanks for your advice, and sorry for the late reply. To clarify things, I have a setup of a controller node and a compute node that supports IPMI. I am also using kolla-ansible as my deployment tool. The controller node is booted with Ubuntu 20.04, and the IPMI node is booted with Ubuntu 22.04. I was able to enroll the node via Horizon as a bare metal node. However, when I attempt to directly boot up an ISO image onto the bare metal node, it is giving me the timeout error. Do I have to set up an instance before booting? Below is a more precise scenario of how my installation works.

First, I uploaded three different images (kernel, ramdisk, ISO) in Glance using the following command.
1. Creating kernel and ramdisk image
$ openstack image create --disk-format aki --container-format aki --public   --file /etc/kolla/config/ironic/ironic-agent.kernel deploy-vmlinuz
$ openstack image create --disk-format ari --container-format ari --public   --file /etc/kolla/config/ironic/ironic-agent.initramfs deploy-initrd
2. Creating ISO image
$ disk-image-create -o output.qcow vm block-device-gpt ubuntu-minimal
$ openstack image create --disk-format=qcow2 --container-format=bare --file=output.qcow.qcow2 --property os_distro='ubuntu' output_test

Sounds like you're trying to deploy a whole disk image, not boot an iso... I would recommend just making sure that whatever image you have created, it is bootable as a VM directly.
 

Here, I will talk about how I enrolled in my node. I have given the following details
1. Node Info: node name, node driver (ipmi), properties (vendor: supermicro, cpu_arch: x86_64), instance info (image_resource: <ID of the ISO image [output_test]>, root_gb: 10)

image_source is the field you are looking for, however that writes the image to the remote disk depending on what the "deploy_interface" is set to on the node. Below, your noting "direct" deploy, so that is not ISO booting, that is write the contents directly to disk.
 
2. Driver Details: deploy kernel (<ID of the kernel image [deploy-vmlinuz]>), deploy ramdisk (<ID of the ramdisk image [deploy-initrd]>), ipmi_address (<IP of the ethernet plugin into supermicro IPMI port>), ipmi_bridging (no), agent_verify_ca (True), deploy_foces_oob_reboot (False), ('Default'), (False), ipmi_password (<password to login to IPMI>), ipmi_priv_level (ADMINISTRATOR), ipmi_protocol_version (2.0), ipmi_terminal_port (9091), ipmi_username (<username to login to IPMI>)

I'm assuming you mean "deploy_kernel" and "deploy_ramdisk". As an FYI, most of those settings are not required out of the box. Hostname, username, password is generally where you should start out. Version defaults to 2.0 and administrator privileges. 
 
3. Driver Interfaces: boot (pxe), console (ipmitool-shellinabox), deploy (direct), inspect (inspector), management (ipmitool), network (flat), power (ipmitool), raid (agent), storage (noop), vendor (ipmitool)

After the node is enrolled, I noticed that the driver validation shows that all interfaces, such as boot, deploy,  management, network, power, console, inspect, raid, and storage, are validated. However, both bios and rescue are not validated. I am unsure if that will affect either the inspection or deployment stages.


Neither will impact it. They are both optional in this case.
 
Here, I will talk about how my network is being created and attached. I first created a port on my public network by giving the following details
1. Info: Name, Enable Admin State (True), Device ID (<blank>), Device Owner (baremetal:none), Binding Host (<ID of the enrolled baremetal node>), MAC address (<blank>), Binding: VNIC Type (Bare Metal), Port Security (True)

Port security is kind of pointless with Bare Metal, JFYI.

But furthermore, you shouldn't be setting a bunch of neutron port details up in advance. Ironic will update the MAC address of whatever "vif" port you attempt to attach, and it will update the other details as needed.

 
2. Security Group: Port Security Groups (default)
After Creating it, I obtained the MAC address of that port and run the following command to attach that port to it

Doesn't apply with Bare Metal unless you have an ML2 driver with switch gear which supports using that information.
 

1. $baremetal port create --node <ID of the enrolled baremetal node>  --physical-network physnet1 <MAC address of the port obtained>
After creating the port, I will then manually attach the port to the baremetal node as well since it requires a virtual interface to deploy.
1. $baremetal node vif attach <ID of the enrolled baremetal node> <ID of the port created>

Finally, I have moved the provision state of the node from manageable to available, and then moved from available to active. This will then start the deployment process, and after 30 minutes, the ironic service will hit a timeout error. Let me know if you need any further information about the deployment and the concerns you have.

I guess the question I have is what do you view, if you're just trying to boot an ISO, you should be using instance_info/boot_iso and the "ramdisk" deploy_interface. If that image is a bootable whole disk image, you can just ask ironic to deploy it. The primary thing you should be looking for is the node "last_error" field, and view the console on the host. Honestly, it sounds like it is not network booting, and the best way to start to troubleshoot that is to just watch the console of the node. Each server is different, and if you see it attempting to network boot, I'd go check traffic to see if the DHCP requests are getting to and being responded to by Neutron.

Hope that helps!

-Julia
 

Best,
James


On Tue, May 28, 2024 at 4:59 AM Jay Faulkner <jay@gr-oss.io> wrote:
Hi James,

Can you give me a little more detail about your deployment? It sounds like you may be trying to deploy via an ISO with Ironic; which isn't really something we directly support.

You do have some choices:
- You can boot an ISO ephemerally using Ironic's ramdisk driver. This is unlikely to be what you want unless you're certain.
- You can use a tool like diskimage-builder ( https://docs.openstack.org/diskimage-builder/latest/ ) to build a bare metal image suitable for deployment.
- You can attempt to use a prebuilt cloud image from Ubuntu; but this may not work as such images often are missing required drivers for physical bare metal.

Can you provide more detail about your installation if these aren't on the right track -- thanks!

-
Jay Faulkner

On Mon, May 27, 2024 at 12:41 PM James Leong <jamesleong123098@gmail.com> wrote:
Hi all, 

I am trying to play around with ironic, and I am having an issue with deploying an Ubuntu 20.04 ISO image onto my bare metal node. Since I am quite new to bare metal, I am not entirely sure what went wrong as I couldn't find any error messages shown on my log file (ironic-conductor, ironic-api, ironic_neutron_agent, and nova-compute-ironic). This might be my guess; the ironic inspection is hitting timeout, which is causing the deployment to timeout. However, I have no idea why the inspection is failing at timeout. Any help would be appreciated.

Best,
James