Hi Team/Julia,

Thank you for your constant help @Julia Kreger.

We decided to install the wallaby release using online sources.
We followed the link:
          https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/deployment/install_undercloud.html

When the installation of the undercloud was successful, We found out that all the containers except ironic_pxe_http were in healthy state as opposed to the mentioned container which was in an unhealthy state.
We collected  the pcap files during the node introspection at this point, and following is our result:

image.png
As you can see, we are getting a read request from our baremetal node, but our tftp server is not replying with the acknowledgement message.
We have seen in normal cases that at this point data transfer should begin which is not happening here.

Apart from this the container ironic_pxe_http is an unhealthy state as mentioned previously. On inspecting the pod, we are getting the following error:

```
"Log": [
                         {
                              "Start": "2023-08-22T13:03:17.113762224+05:30",
                              "End": "2023-08-22T13:03:17.400862661+05:30",
                              "ExitCode": 1,
                              "Output": "/usr/sbin/httpd -DFOREGROUND\ncurl: (22) The requested URL returned error: 404 Not Found\n\n404 ca:ca:ca:9900::43:8088 0.000512 seconds"
                         },
```
We think that ca:ca:ca:9900::43:8088 is not a valid syntax. For IPV6 ips, it should be [ca:ca:ca:9900::43]:8088. Kindly note that this is our assumption. 
Could you please help us out with it?

Thanks and Regards,
Kushagra Gupta

On Fri, Aug 11, 2023 at 2:44 AM Julia Kreger <juliaashleykreger@gmail.com> wrote:
Greetings,

I would recommend verifying you can ping addresses, and then inspect firewall rules, since it sounds like the issue is rooted in the state of the undercloud node. I'm unaware of any specific configuration which would cause this, meaning you would realistically need to identify why the packets are not making it through to the service.

-Julia

On Thu, Aug 10, 2023 at 4:21 AM Lokendra Rathour <lokendrarathour@gmail.com> wrote:
Hi Juliya/ Team,

We are yet failing to get the ipv6 provisioning. Steps/report shared by Kushagra needs your help. 

Thanks once again for your help.

-Lokendra


On Tue, Aug 8, 2023 at 6:12 PM Kushagr Gupta <kushagrguptasps.mun@gmail.com> wrote:
Hi Julia,Team,

Thank you for the response @Julia Kreger 

On Thu, Jul 27, 2023 at 6:59 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:

I guess what is weird in this entire thing is it sounds like you're shifting over to what appears to be OPROM boot code in a network interface card, which might not support v6. Then again a port mirrored packet capture would be the needful item to troubleshoot further.

We have setup a local dnsmasq-dhcp server and TFTP server on a VM and tried PXE booting the same set of hardwares.
The hardware are booting on IPV6 so I think the hardware supports IPV6 PXE booting.

 
Are you able to extract the exact command line which is being passed to the dnsmasq process for that container launch?
 
I guess I'm worried if somehow dnsmasq changed or if an old version is somehow in the container image you're using.


 The command line which is getting executed is as follows: 
"  "command": [
    "/bin/bash",
    "-c",
    "BIND_HOST=ca:ca:ca:9900::171; /usr/sbin/dnsmasq --keep-in-foreground --log-facility=/var/log/ironic/dnsmasq.log --user=root --conf-file=/dev/null --listen-address=$BIND_HOST --port=0 --enable-tftp --tftp-root=/var/lib/ironic/tftpboot"
  ],
"
We found this command in the following: 
/var/lib/tripleo-config/container-startup-config/step_4/ironic_pxe_tftp.json

Apart from this we also tried to install the openstack version zed.
In this version, the container ironic_pxe_tftp is up and running but we were still getting the same error:

image.png

We tried to curl the file which the TFTP container provides from a remote machine(not the undercloud), but we are unable to curl it.

image.png

But when, we do the same thing from the undercloud, it is working fine:

image.png

We also set up an undercloud machine on ipv4 for comparison.
When we tried to curl the image from a remote machine(not the undercloud) for this server, we were able to curl it.

image.png

On further digging, we found that in the zed release, the "ironic_pxe_tftp" is in healthy state while three containers namely: "ironic_api","ironic_conductor","ironic_pxe_http" are in unhealthy state but are up and running.
We re-installed the undercloud on the fresh machine and re-tried node introspection after performing basic tasks like image upload, node registration.
To our surprise, the Introspection was successful. and the nodes came in available state:

nodes_available_4.PNG

At this point we were also able to curl the file from a random machine: 

image.png

But it all stopped once we restarted the undercloud node even though all the containers were up and running.
We are further investigating this issue.

Thanks and Regards
Kushagra Gupta


--
~ Lokendra
skype: lokendrarathour