[Kolla][Kolla-Ansible] Ironic Node Cleaning Failed
Anirudh Gupta
anyrude10 at gmail.com
Fri Aug 6 12:12:23 UTC 2021
Hi Dmitry,
I tried taking TCPDUMP while the Baremetal Node was booting up and looked
for tftp protocols and found there was some "*File Not Found" *traces for
bootx64.efi
[image: image.png]
Then, I found a related post on openstack Discuss which suggested to enable
IPXE
http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010329.html
After re-deploying the setup with IPXE enabled, i found similar traces now
for *ipxe.efi file*
[image: image.png]
Can you please now suggest what possibly could be a miss in configuration
and steps to resolve it.
For your reference, I am attaching the complete tcpdump logs of both the
Scenarios
Looking forward to hearing from you.
Regards
Anirudh Gupta
On Thu, Aug 5, 2021 at 4:56 PM Anirudh Gupta <anyrude10 at gmail.com> wrote:
> Hi Team,
>
> On further debugging, I found an error in neutron-server logs
>
>
> Failed to bind port 476d8175-ffc2-49ba-bb12-0a77c1f07e5f on host
> f4a43fa5-9c41-488e-a34d-714ae5a9d300 for vnic_type baremetal using segments
> [{'id': '1a5bbe96-2488-4971-925f-7c9346ba3ef5', 'network_type': 'flat',
> 'physical_network': 'physnet1', 'segmentation_id': None, 'network_id':
> '5b6cccec-ad86-4ed9-8d3c-72a31ec3a0d4'}]
> 2021-08-05 16:33:06.979 23 INFO neutron.plugins.ml2.plugin
> [req-54d11d51-7319-43ea-b70c-fe39d8aafe8a 21d6a238438e4294912746bcdc895e31
> 3eca725754e1405eb178cc39bd0da3aa - default default] Attempt 9 to bind port
> 476d8175-ffc2-49ba-bb12-0a77c1f07e5f
>
> where 476d8175-ffc2-49ba-bb12-0a77c1f07e5f is the uuid of Baremetal Node
>
> However the port is created in openstack, but its state is down
>
> [ansible at localhost ~]$ openstack port list
>
> +--------------------------------------+------+-------------------+---------------------------------------------------------------------------+--------+
> | ID | Name | MAC Address | Fixed
> IP Addresses |
> Status |
>
> +--------------------------------------+------+-------------------+---------------------------------------------------------------------------+--------+
> | 07d6b83d-d83c-498f-8ba8-b4f21bef7249 | | fa:16:3e:38:05:9d |
> ip_address='10.0.1.200', subnet_id='7b72c158-2146-4bd6-893b-bd76b4a3e869' |
> ACTIVE |
> | 476d8175-ffc2-49ba-bb12-0a77c1f07e5f | | *98:f2:b3:3f:72:d8* |
> ip_address='10.0.1.202', subnet_id='7b72c158-2146-4bd6-893b-bd76b4a3e869' | *DOWN
> * |
>
> +--------------------------------------+------+-------------------+---------------------------------------------------------------------------+--------+
>
> *98:f2:b3:3f:72:d8 *is the mac address of my Baremetal Node on which PXE
> is enabled.
>
> Can someone please help in resolving this issue.
>
> *Issue:*
> *Node goes in clean_failed from clean_wait.*
>
> Regards
> Anirudh Gupta
>
> On Tue, Aug 3, 2021 at 8:32 PM Anirudh Gupta <anyrude10 at gmail.com> wrote:
>
>> Hi Dmitry,
>>
>> I might be wrong, but as per my understanding if there would be an issue
>> in dnsmasq, then IP 20.20.20.10 would not have been assigned to the machine.
>>
>> TCPDUMP logs are as below:
>>
>> 20:16:58.938089 IP controller.bootps > 255.255.255.255.bootpc:
>> BOOTP/DHCP, Reply, length 312
>> 20:17:02.765291 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP,
>> Request from 98:f2:b3:3f:72:e5 (oui Unknown), length 359
>> 20:17:02.766303 IP controller.bootps > 255.255.255.255.bootpc:
>> BOOTP/DHCP, Reply, length 312
>> 20:17:26.944378 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP,
>> Request from 98:f2:b3:3f:72:e5 (oui Unknown), length 347
>> 20:17:26.944756 IP controller.bootps > 255.255.255.255.bootpc:
>> BOOTP/DHCP, Reply, length 312
>> 20:17:30.763627 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP,
>> Request from 98:f2:b3:3f:72:e5 (oui Unknown), length 359
>> 20:17:30.764620 IP controller.bootps > 255.255.255.255.bootpc:
>> BOOTP/DHCP, Reply, length 312
>> 20:17:54.938791 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP,
>> Request from 98:f2:b3:3f:72:e5 (oui Unknown), length 347
>>
>> Also the neutron dnsmasq logs and ironic inspector logs are attached in
>> the mail.
>>
>> Regards
>> Anirudh Gupta
>>
>>
>> On Tue, Aug 3, 2021 at 7:29 PM Dmitry Tantsur <dtantsur at redhat.com>
>> wrote:
>>
>>> Hi,
>>>
>>> You need to check the dnsmasq logs (there are two dnsmasqs: from neutron
>>> and from ironic-inspector). tcpdump may also help to determine where the
>>> packages are lost.
>>>
>>> Dmitry
>>>
>>> On Fri, Jul 30, 2021 at 10:29 PM Anirudh Gupta <anyrude10 at gmail.com>
>>> wrote:
>>>
>>>> Hi Dmitry
>>>>
>>>> Thanks for your time.
>>>>
>>>> My system is getting IP 20.20.20.10 which is in the range defined in
>>>> ironic_dnsmasq_dhcp_range field under globals.yml file.
>>>>
>>>> ironic_dnsmasq_dhcp_range: "20.20.20.10,20.20.20.100"
>>>>
>>>> And in the cleaning network (public1), the range defined is
>>>> 20.20.20.150-20.20.20.200
>>>>
>>>> As per my understanding, these 2 ranges should be mutually exclusive.
>>>>
>>>> Please suggest if my understanding is not correct.
>>>>
>>>> Any suggestions what should I do to resolve this issue?
>>>>
>>>> Regards
>>>> Anirudh Gupta
>>>>
>>>>
>>>> On Sat, 31 Jul, 2021, 12:06 am Dmitry Tantsur, <dtantsur at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 29, 2021 at 6:05 PM Anirudh Gupta <anyrude10 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Team,
>>>>>>
>>>>>> In to the email below, I have some updated information:-
>>>>>>
>>>>>> Earlier the allocation range mentioned in "
>>>>>> *ironic_dnsmasq_dhcp_range*" in globals.yml had an overlapping range
>>>>>> with the cleaning network, due to which there was some issue in receiving
>>>>>> the DHCP request
>>>>>>
>>>>>> After creating a cleaning network with a separate allocation range, I
>>>>>> am successfully getting IP allocated to my Baremetal Node
>>>>>>
>>>>>> - openstack subnet create subnet1 --network public1
>>>>>> --subnet-range 20.20.20.0/24 --allocation-pool
>>>>>> start=20.20.20.150,end=20.20.20.200 --ip-version=4 --gateway=20.20.20.1
>>>>>> --dhcp
>>>>>>
>>>>>>
>>>>>> [image: image.png]
>>>>>>
>>>>>> After getting the IP, there is no further action on the node. From "
>>>>>> *clean_wait*", it goes into "*clean_failed*" state after around half
>>>>>> an hour.
>>>>>>
>>>>>
>>>>> The IP address is not from the cleaning range, it may come from
>>>>> inspection. You probably need to investigate your network topology, maybe
>>>>> use tcpdump.
>>>>>
>>>>> Unfortunately, I'm not fluent in Kolla to say if it can be a bug or
>>>>> not.
>>>>>
>>>>> Dmitry
>>>>>
>>>>>
>>>>>>
>>>>>> On verifying the logs, I could see the below error messages
>>>>>>
>>>>>>
>>>>>> - In */var/log/kolla/ironic/ironic-conductor.log*, we observed
>>>>>> the following error:
>>>>>>
>>>>>> ERROR ironic.conductor.utils [-] Cleaning for node
>>>>>> 3a56748e-a8ca-4dec-a332-ace18e6d494e failed. *Timeout reached while
>>>>>> cleaning the node. Please check if the ramdisk responsible for the cleaning
>>>>>> is running on the node. Failed on step {}.*
>>>>>>
>>>>>>
>>>>>> Note : For Cleaning the node, we have used the below images
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://tarballs.openstack.org/ironic-python-agent/dib/files/ipa-centos8-master.kernel
>>>>>>
>>>>>>
>>>>>> https://tarballs.openstack.org/ironic-python-agent/dib/files/ipa-centos8-master.initramfs
>>>>>>
>>>>>>
>>>>>> - In /var/log/kolla/nova/nova-compute-ironic.log, we observed the
>>>>>> error
>>>>>>
>>>>>> ERROR nova.compute.manager [req-810ffedf-3343-471c-94db-85411984e6cc
>>>>>> - - - - -] No compute node record for host controller-ironic:
>>>>>> nova.exception_Remote.ComputeHostNotFound_Remote: Compute host
>>>>>> controller-ironic could not be found.
>>>>>>
>>>>>>
>>>>>> Can someone please help in this regard?
>>>>>>
>>>>>> Regards
>>>>>> Anirudh Gupta
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 27, 2021 at 12:52 PM Anirudh Gupta <anyrude10 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Team,
>>>>>>>
>>>>>>> We have deployed 2 node kolla ansible *12.0.0* in order to deploy
>>>>>>> openstack *wallaby* release. We have also enabled ironic in order
>>>>>>> to provision the bare metal nodes.
>>>>>>>
>>>>>>> On each server we have 3 nics
>>>>>>>
>>>>>>> - *eno1* - OAM for external connectivity and endpoint's publicURL
>>>>>>> - *eno2* - Mgmt for internal communication between various
>>>>>>> openstack services.
>>>>>>> - *ens2f0* - Data Interface
>>>>>>>
>>>>>>>
>>>>>>> Corresponding to this we have defined the following fields in
>>>>>>> globals.yml
>>>>>>>
>>>>>>>
>>>>>>> - kolla_base_distro: "centos"
>>>>>>> - kolla_install_type: "source"
>>>>>>> - openstack_release: "wallaby"
>>>>>>> - network_interface: "eno2" # MGMT
>>>>>>> interface
>>>>>>> - kolla_external_vip_interface: "eno1" # OAM
>>>>>>> Interface
>>>>>>> - kolla_internal_vip_address: "192.168.10.3" # MGMT Subnet
>>>>>>> free ip
>>>>>>> - kolla_external_vip_address: "10.0.1.136" # OAM subnet
>>>>>>> free IP
>>>>>>> - neutron_external_interface: "ens2f0" # Data
>>>>>>> Interface
>>>>>>> - enable_neutron_provider_networks: "yes"
>>>>>>>
>>>>>>> Note: Only relevant fields are being shown in this query
>>>>>>>
>>>>>>> Also, for ironic following fields have been defined in globals.yml
>>>>>>>
>>>>>>> - enable_ironic: "yes"
>>>>>>> - enable_ironic_neutron_agent: "{{ enable_neutron | bool and
>>>>>>> enable_ironic | bool }}"
>>>>>>> - enable_horizon_ironic: "{{ enable_ironic | bool }}"
>>>>>>> - ironic_dnsmasq_interface: "*ens2f0*" #
>>>>>>> Data interface
>>>>>>> - ironic_dnsmasq_dhcp_range: "20.20.20.10,20.20.20.100"
>>>>>>> - ironic_dnsmasq_boot_file: "pxelinux.0"
>>>>>>> - ironic_cleaning_network: "public1"
>>>>>>> - ironic_dnsmasq_default_gateway: "20.20.20.1"
>>>>>>>
>>>>>>>
>>>>>>> After successful deployment, a flat provider network with the name
>>>>>>> public1 is being created in openstack using the below commands:
>>>>>>>
>>>>>>>
>>>>>>> - openstack network create public1 --provider-network-type flat
>>>>>>> --provider-physical-network physnet1
>>>>>>> - openstack subnet create subnet1 --network public1
>>>>>>> --subnet-range 20.20.20.0/24 --allocation-pool
>>>>>>> start=20.20.20.10,end=20.20.20.100 --ip-version=4 --gateway=20.20.20.1
>>>>>>> --dhcp
>>>>>>>
>>>>>>>
>>>>>>> Issue/Queries:
>>>>>>>
>>>>>>>
>>>>>>> - Is the configuration done in globals.yml correct or is there
>>>>>>> anything else that needs to be done in order to separate control and data
>>>>>>> plane traffic?
>>>>>>>
>>>>>>>
>>>>>>> - Also I have set automated_cleaning as "true" in
>>>>>>> ironic-conductor conatiner settings.But after creating the baremetal node,
>>>>>>> we run "node manage" command which runs successfully. Running "*openstack
>>>>>>> baremetal node provide <node id>"* command powers on the
>>>>>>> machine, sets the boot mode on Network Boot but no DHCP request for that
>>>>>>> particular mac is obtained on the controller. Is there anything I am
>>>>>>> missing that needs to be done in order to make ironic work?
>>>>>>>
>>>>>>> Note: I have also verified that the nic is PXE enabled in system
>>>>>>> configuration setting
>>>>>>>
>>>>>>> Regards
>>>>>>> Anirudh Gupta
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
>>>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>>>> Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs,
>>>>> Michael O'Neill
>>>>>
>>>>
>>>
>>> --
>>> Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>> Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael
>>> O'Neill
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210806/81049c63/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 38285 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210806/81049c63/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 185546 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210806/81049c63/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 200447 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210806/81049c63/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump.pcap
Type: application/octet-stream
Size: 732134 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210806/81049c63/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump_ipxe.pcap
Type: application/octet-stream
Size: 1275772 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210806/81049c63/attachment-0003.obj>
More information about the openstack-discuss
mailing list