Hi Harald, Responding on behalf of Anirudh's email: Thanks for the response and we now do understand that we are getting IP from the expected DHCP server. We tried the scenario and here are our findings, Our admin and internal endpoints are on subnet: 30.30.30.x public : 10.0.1.x (overcloud) [stack@undercloud ~]$ *OpenStack endpoint list | grep ironic* | 04c163251e5546769446a4fa4fa20484 | regionOne | ironic | baremetal | True | admin | http://30.30.30.213:6385 | | 5c8557ae639a4898bdc6121f6e873724 | regionOne | ironic | baremetal | True | internal | http://30.30.30.213:6385 | | 62e07a3b2f3f4158bb27d8603a8f5138 | regionOne | ironic-inspector | baremetal-introspection | True | public | http://10.0.1.88:5050 | | af29bd64513546409f44cc5d56ea1082 | regionOne | ironic-inspector | baremetal-introspection | True | internal | http://30.30.30.213:5050 | | b76cdb5e77c54fc6b10cbfeada0e8bf5 | regionOne | ironic-inspector | baremetal-introspection | True | admin | http://30.30.30.213:5050 | | bd2954f41e49419f85669990eb59f51a | regionOne | ironic | baremetal | True | public | http://10.0.1.88:6385 | (overcloud) [stack@undercloud ~]$ we are following the flat default n/w approach for ironic provisioning, for which we are creating a flat network on baremetal physnet. we are still getting IP from neutron range (172.23.3.220 - 172.23.3.240) - 172.23.3.240. Further, we found that once IP (172.23.3.240) is allocated to baremetal node, it looks for 30.30.30.220( IP of one of the three controllers) for pxe booting. Checking the same controllers logs we found that *`/var/lib/ironic/tftpboot/pxelinux.cfg/` directory exists,* but then there is *no file matching the mac *address of the baremetal node. Also checking the *extra_dhcp_opts* we found this: (overcloud) [stack@undercloud ~]$ *openstack port show d7e573bf-1028-437a-8118-a2074c7573b2 | grep "extra_dhcp_opts"* | extra_dhcp_opts | ip_version='4', opt_name='tag:ipxe,67', opt_value='http://30.30.30.220:8088/boot.ipxe' [image: image.png] *Few points as observations:* 1. Although the baremetal network (172.23.3.x) is routable to the admin network (30.30.30.x), but it gets timeout at this window. 2. in TCPDump we are only getting read requests. 3. `openstack baremetal node list 1. (overcloud) [stack@undercloud ~]$ openstack baremetal node list +--------------------------------------+------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------+---------------+-------------+--------------------+-------------+ | 7066fbe1-9c29-4702-9cd4-2b55daf19630 | bm1 | None | power on | clean wait | False | +--------------------------------------+------+---------------+-------------+--------------------+-------------+ 4. `openstack baremetal node show <node-uuid>` 1. (overcloud) [stack@undercloud ~]$ openstack baremetal node show bm1 +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | allocation_uuid | None | | automated_clean | None | | bios_interface | no-bios | | boot_interface | ipxe | | chassis_uuid | None | | clean_step | {} | | conductor | overcloud-controller-0.localdomain | | conductor_group | | | console_enabled | False | | console_interface | ipmitool-socat | | created_at | 2022-02-09T14:21:24+00:00 | | deploy_interface | iscsi | | deploy_step | {} | | description | None | | driver | ipmi | | driver_info | {'ipmi_address': '10.0.1.183', 'ipmi_username': 'hsc', 'ipmi_password': '******', 'ipmi_terminal_port': 623, 'deploy_kernel': '9e1365b6-261a-42a2-abfe-40158945de57', 'deploy_ramdisk': 'fe608dd2-ce86-4faf-b4b8-cc5cb143eb56'} | | driver_internal_info | {'agent_erase_devices_iterations': 1, 'agent_erase_devices_zeroize': True, 'agent_continue_if_ata_erase_failed': False, 'agent_enable_ata_secure_erase': True, 'disk_erasure_concurrency': 1, 'last_power_state_change': '2022-02-09T14:23:39.525629'} | | extra | {} | | fault | None | | inspect_interface | inspector | | inspection_finished_at | None | | inspection_started_at | None | | instance_info | {} | | instance_uuid | None | | last_error | None | | maintenance | False | | maintenance_reason | None | | management_interface | ipmitool | | name | bm1 | | network_interface | flat | | owner | None | | power_interface | ipmitool | | power_state | power on | | properties | {'cpus': 20, 'cpu_arch': 'x86_64', 'capabilities': 'boot_option:local,boot_mode:uefi', 'memory_mb': 63700, 'local_gb': 470, 'vendor': 'hewlett-packard'} | | protected | False | | protected_reason | None | | provision_state | clean wait | | provision_updated_at | 2022-02-09T14:24:05+00:00 | | raid_config | {} | | raid_interface | no-raid | | rescue_interface | agent | | reservation | None | | resource_class | bm1 | | storage_interface | noop | | target_power_state | None | | target_provision_state | available | | target_raid_config | {} | | traits | [] | | updated_at | 2022-02-09T14:24:05+00:00 | | uuid | 7066fbe1-9c29-4702-9cd4-2b55daf19630 | | vendor_interface | ipmitool | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ (overcloud) [stack@undercloud ~]$ *Queries:* - What are the settings we can do for successfully pxe-boot of the baremetal node and provisioning our node successfully ? On Tue, Feb 8, 2022 at 6:27 PM Harald Jensas <hjensas@redhat.com> wrote:
On 2/7/22 13:47, Anirudh Gupta wrote:
Hi Julia,
Thanks a lot for your responses and support. To Update on the ongoing issue, I tried deploying the overcloud with your valuable suggestions i.e by passing "*DhcpAgentNotification: true*" in ironic-overcloud.yaml The setup came up successfully, but with this configuration the IP allocated on the system is one which is being configured while creating the subnet in openstack.
image.png
The system is still getting the IP (172.23.3.212) from neutron. The subnet range was configured as *172.23.3.210-172.23.3.240 *while creating the provisioning subnet.
The node is supposed to get an IP address from the neutron subnet on the provisioning network when: a) provisioning node b) cleaning node.
When you do "baremetal node provide" cleaning is most likely automatically initiated. (Since cleaning is enabled by default for Ironic in overcloud AFIK.)
The only time you will get an address from the IronicInspectorSubnets (ip_range: 172.23.3.100,172.23.3.150 in your case) is when you start ironic node introspection.
The system gets stuck here and no action is performed after this.
It seems the system is getting an address from the expected DHCP server, but it does not boot. I would start looking into the pxe properties in the DHCP Reply.
What is the status of the node in ironic at this stage? `openstack baremetal node list` `openstack baremetal node show <node-uuid>`
Check the `extra_dhcp_opts` on the neutron port, it should set the nextserver and bootfile parameters. Does the bootfile exist in /var/lib/ironic/tftpboot? Inspect the `/var/lib/ironic/tftpboot/pxelinux.cfg/` directory, you should see a file matching the MAC address of your system. Does the content make sense?
Can you capture DHCP and TFTP traffic on the provisioning network?
Is there any way to resolve this and make successful provisioning the baremetal node in *TripleO Train Release* (Since RHOSP 16 was on Train, so I thought to go with that version for better stability)
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16....
< https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16....
I have some queries:
1. Is passing "*DhcpAgentNotification: true" *enough or do we have to make some other changes as well?
I belive in train "DhcpAgentNotification" defaults to True. The change to default to false was added more recently, and it was not backported. (https://review.opendev.org/c/openstack/tripleo-heat-templates/+/801761)
NOTE, the environment for enabling ironi for the overcloud 'environments/services/ironic-overcloud.yaml' overrides this to 'true' in later releases.
2. Although there are some security concerns specified in the document, but Currently I am focusing on the default flat bare metal approach which has dedicated interface for bare metal Provisioning. There is one composable method approach as well. Keeping aside the security concerns, which approach is better and functional? 1. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.... < https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16....
Both should work, using the composable network is more secure since baremetal nodes does not have access to the control plane network.
3. Will moving to upper openstack release version make this deployment possible? 1. If Yes, which release should I go with as till wallaby the ironic-overcloud.yml file has no option of including "*DhcpAgentNotification: true*" by default 1. https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/envi... < https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/envi...
Looking forward for your valuable feedback/response.
Regards Anirudh Gupta
On Fri, Feb 4, 2022 at 8:54 PM Anirudh Gupta <anyrude10@gmail.com <mailto:anyrude10@gmail.com>> wrote:
Hi,
Surely I'll revert the status once it gets deployed. Bdw the suspicion is because of Train Release or it is something else?
Regards Anirudh Gupta
On Fri, 4 Feb, 2022, 20:29 Julia Kreger, <juliaashleykreger@gmail.com <mailto:juliaashleykreger@gmail.com>> wrote:
On Fri, Feb 4, 2022 at 5:50 AM Anirudh Gupta <anyrude10@gmail.com <mailto:anyrude10@gmail.com>> wrote:
Hi Julia
Thanks for your response.
Earlier I was passing both ironic.yaml and ironic-overcloud.yaml located at path
/usr/share/openstack-tripleo-heat-templates/environments/services/
My current understanding now says that since I am using OVN, not OVS so I should pass only ironic-overcloud.yaml in my deployment.
I am currently on Train Release and my default ironic-overcloud.yaml file has no such entry DhcpAgentNotification: true
I suspect that should work. Let us know if it does.
I would add this there and re deploy the setup.
Would that be enough to make my deployment successful?
Regards Anirudh Gupta
On Fri, 4 Feb, 2022, 18:40 Julia Kreger, <juliaashleykreger@gmail.com <mailto:juliaashleykreger@gmail.com>> wrote:
It is not a matter of disabling OVN, but a matter of enabling the dnsmasq service and notifications.
https://github.com/openstack/tripleo-heat-templates/blob/master/environments...
<
https://github.com/openstack/tripleo-heat-templates/blob/master/environments...
may provide some insight.
I suspect if you're using stable/wallaby based branches and it is not working, there may need to be a patch backported by the TripleO maintainers.
On Thu, Feb 3, 2022 at 8:02 PM Anirudh Gupta <anyrude10@gmail.com <mailto:anyrude10@gmail.com>>
wrote:
Hi Julia,
Thanks for your response. For the overcloud deployment, I am executing the following command:
openstack overcloud deploy --templates \ -n /home/stack/templates/network_data.yaml \ -r /home/stack/templates/roles_data.yaml \ -e /home/stack/templates/node-info.yaml \ -e /home/stack/templates/environment.yaml \ -e
/home/stack/templates/environments/network-isolation.yaml
\ -e
/home/stack/templates/environments/network-environment.yaml
\ -e
/usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml
\ -e
/usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml
\ -e
/usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml
\ -e
/usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml
\ -e /home/stack/templates/ironic-config.yaml \ -e
/usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml
\ -e
/usr/share/openstack-tripleo-heat-templates/environments/podman.yaml
\ -e /home/stack/containers-prepare-parameter.yaml
I can see some OVN related stuff in my roles_data and environments/network-isolation.yaml
[stack@undercloud ~]$ grep -inr "ovn" roles_data.yaml:34: *OVNCMSOptions: "enable-chassis-as-gw"* roles_data.yaml:168: - *OS::TripleO::Services::OVNDBs* roles_data.yaml:169: - *OS::TripleO::Services::OVNController* roles_data.yaml:279: - *OS::TripleO::Services::OVNController* roles_data.yaml:280: - *OS::TripleO::Services::OVNMetadataAgent* environments/network-isolation.yaml:16: *OS::TripleO::Network::Ports::OVNDBsVipPort: ../network/ports/vip.yaml* * * What is your recommendation and how to disable OVN....should I remove it from roles_data.yaml and then render so that it doesn't get generated in environments/network-isolation.yaml Please suggest some pointers.
Regards Anirudh Gupta * * * *
It seems OVN is getting installed in ironic
On Fri, Feb 4, 2022 at 1:36 AM Julia Kreger <juliaashleykreger@gmail.com <mailto:juliaashleykreger@gmail.com>> wrote:
My guess: You're running OVN. You need neutron-dhcp-agent running as well. OVN disables it by default and OVN's integrated DHCP service does not support options for network booting.
-Julia
On Thu, Feb 3, 2022 at 9:06 AM Anirudh Gupta <anyrude10@gmail.com <mailto:anyrude10@gmail.com>> wrote:
Hi Team
I am trying to Provision Bare Metal Node from my tripleo Overcloud. For this, while deploying the overcloud, I have followed the *"default flat" *network approach specified in the below link
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/...
<
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/...
Just to highlight the changes, I have defined the
*ironic-config.yaml*
parameter_defaults: ... ... IronicIPXEEnabled: true IronicInspectorSubnets: - ip_range: *172.23.3.100,172.23.3.150* IronicInspectorInterface: 'br-baremetal'
Also modified the file *~/templates/network-environment.yaml*
parameter_defaults: NeutronBridgeMappings: datacentre:br-ex,baremetal:br-baremetal NeutronFlatNetworks: datacentre,baremetal
With this I have Followed all the steps of creating br-baremetal bridge on controller, given in the link below:
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/...
<
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/15/...
- type: ovs_bridge name: br-baremetal use_dhcp: false members: - type: interface name: nic3
Post Deployment, I have also create a flat network on "datacentre" physical network and subnet having the range *172.23.3.200,172.23.3.240 *(as suggested subnet is same as of inspector and range is different) and the router
Also created a baremetal node and ran *"openstack baremetal node manage bm1", *the state of which was a success.
Observation:
On executing "openstack baremetal node *provide* bm1", the machine gets power on and ideally it should take an IP from ironic inspector range and PXE Boot. But nothing of this sort happens and we see an IP from neutron range "*172.23.3.239*" (attached the screenshot)
image.png
I have checked overcloud ironic inspector podman logs alongwith the tcpdump. In tcpdump, I can only see dhcp discover request on br-baremetal and nothing happens after that.
I have tried to explain my issue in detail, but I would be happy to share more details in case still required. Can someone please help in resolving my
issue.
Regards Anirudh Gupta