[Kolla][Kolla-Ansible] Ironic Node Cleaning Failed

Dmitry Tantsur dtantsur at redhat.com
Tue Aug 3 13:58:52 UTC 2021


Hi,

You need to check the dnsmasq logs (there are two dnsmasqs: from neutron
and from ironic-inspector). tcpdump may also help to determine where the
packages are lost.

Dmitry

On Fri, Jul 30, 2021 at 10:29 PM Anirudh Gupta <anyrude10 at gmail.com> wrote:

> Hi Dmitry
>
> Thanks for your time.
>
> My system is getting IP 20.20.20.10 which is in the range defined in
> ironic_dnsmasq_dhcp_range field under globals.yml file.
>
> ironic_dnsmasq_dhcp_range: "20.20.20.10,20.20.20.100"
>
> And in the cleaning network (public1), the range defined is
> 20.20.20.150-20.20.20.200
>
> As per my understanding, these 2 ranges should be mutually exclusive.
>
> Please suggest if my understanding is not correct.
>
> Any suggestions what should I do to resolve this issue?
>
> Regards
> Anirudh Gupta
>
>
> On Sat, 31 Jul, 2021, 12:06 am Dmitry Tantsur, <dtantsur at redhat.com>
> wrote:
>
>>
>>
>> On Thu, Jul 29, 2021 at 6:05 PM Anirudh Gupta <anyrude10 at gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> In  to the email below, I have some updated information:-
>>>
>>> Earlier the allocation range mentioned in "*ironic_dnsmasq_dhcp_range*"
>>> in globals.yml had an overlapping range with the cleaning network, due to
>>> which there was some issue in receiving the DHCP request
>>>
>>> After creating a cleaning network with a separate allocation range, I am
>>> successfully getting IP allocated to my Baremetal Node
>>>
>>>    - openstack subnet create subnet1 --network public1 --subnet-range
>>>    20.20.20.0/24 --allocation-pool start=20.20.20.150,end=20.20.20.200
>>>    --ip-version=4  --gateway=20.20.20.1 --dhcp
>>>
>>>
>>> [image: image.png]
>>>
>>> After getting the IP, there is no further action on the node. From "
>>> *clean_wait*", it goes into "*clean_failed*" state after around half an
>>> hour.
>>>
>>
>> The IP address is not from the cleaning range, it may come from
>> inspection. You probably need to investigate your network topology, maybe
>> use tcpdump.
>>
>> Unfortunately, I'm not fluent in Kolla to say if it can be a bug or not.
>>
>> Dmitry
>>
>>
>>>
>>> On verifying the logs, I could see the below error messages
>>>
>>>
>>>    - In */var/log/kolla/ironic/ironic-conductor.log*, we observed the
>>>    following error:
>>>
>>> ERROR ironic.conductor.utils [-] Cleaning for node
>>> 3a56748e-a8ca-4dec-a332-ace18e6d494e failed. *Timeout reached while
>>> cleaning the node. Please check if the ramdisk responsible for the cleaning
>>> is running on the node. Failed on step {}.*
>>>
>>>
>>> Note : For Cleaning the node, we have used the below images
>>>
>>>
>>>
>>> https://tarballs.openstack.org/ironic-python-agent/dib/files/ipa-centos8-master.kernel
>>>
>>>
>>> https://tarballs.openstack.org/ironic-python-agent/dib/files/ipa-centos8-master.initramfs
>>>
>>>
>>>    - In /var/log/kolla/nova/nova-compute-ironic.log, we observed the
>>>    error
>>>
>>> ERROR nova.compute.manager [req-810ffedf-3343-471c-94db-85411984e6cc - -
>>> - - -] No compute node record for host controller-ironic:
>>> nova.exception_Remote.ComputeHostNotFound_Remote: Compute host
>>> controller-ironic could not be found.
>>>
>>>
>>> Can someone please help in this regard?
>>>
>>> Regards
>>> Anirudh Gupta
>>>
>>>
>>> On Tue, Jul 27, 2021 at 12:52 PM Anirudh Gupta <anyrude10 at gmail.com>
>>> wrote:
>>>
>>>> Hi Team,
>>>>
>>>> We have deployed 2 node kolla ansible *12.0.0* in order to deploy
>>>> openstack *wallaby* release. We have also enabled ironic in order to
>>>> provision the bare metal nodes.
>>>>
>>>> On each server we have 3 nics
>>>>
>>>>    - *eno1* - OAM for external connectivity and endpoint's publicURL
>>>>    - *eno2* - Mgmt for internal communication between various
>>>>    openstack services.
>>>>    - *ens2f0* - Data Interface
>>>>
>>>>
>>>> Corresponding to this we have defined the following fields in
>>>> globals.yml
>>>>
>>>>
>>>>    - kolla_base_distro: "centos"
>>>>    - kolla_install_type: "source"
>>>>    - openstack_release: "wallaby"
>>>>    - network_interface: "eno2"                               # MGMT
>>>>    interface
>>>>    - kolla_external_vip_interface: "eno1"               # OAM Interface
>>>>    - kolla_internal_vip_address: "192.168.10.3"    # MGMT Subnet free
>>>>    ip
>>>>    - kolla_external_vip_address: "10.0.1.136"       # OAM subnet free
>>>>    IP
>>>>    - neutron_external_interface: "ens2f0"             # Data Interface
>>>>    - enable_neutron_provider_networks: "yes"
>>>>
>>>> Note: Only relevant fields are being shown in this query
>>>>
>>>> Also, for ironic following fields have been defined in globals.yml
>>>>
>>>>    - enable_ironic: "yes"
>>>>    - enable_ironic_neutron_agent: "{{ enable_neutron | bool and
>>>>    enable_ironic | bool }}"
>>>>    - enable_horizon_ironic: "{{ enable_ironic | bool }}"
>>>>    - ironic_dnsmasq_interface: "*ens2f0*"                       # Data
>>>>    interface
>>>>    - ironic_dnsmasq_dhcp_range: "20.20.20.10,20.20.20.100"
>>>>    - ironic_dnsmasq_boot_file: "pxelinux.0"
>>>>    - ironic_cleaning_network: "public1"
>>>>    - ironic_dnsmasq_default_gateway: "20.20.20.1"
>>>>
>>>>
>>>> After successful deployment, a flat provider network with the name
>>>> public1 is being created in openstack using the below commands:
>>>>
>>>>
>>>>    - openstack network create public1 --provider-network-type flat
>>>>    --provider-physical-network physnet1
>>>>    - openstack subnet create subnet1 --network public1 --subnet-range
>>>>    20.20.20.0/24 --allocation-pool start=20.20.20.10,end=20.20.20.100
>>>>    --ip-version=4  --gateway=20.20.20.1 --dhcp
>>>>
>>>>
>>>> Issue/Queries:
>>>>
>>>>
>>>>    - Is the configuration done in globals.yml correct or is there
>>>>    anything else that needs to be done in order to separate control and data
>>>>    plane traffic?
>>>>
>>>>
>>>>    - Also I have set automated_cleaning as "true" in ironic-conductor
>>>>    conatiner settings.But after creating the baremetal node, we run "node
>>>>    manage" command which runs successfully. Running "*openstack
>>>>    baremetal node provide <node id>"* command powers on the machine,
>>>>    sets the boot mode on Network Boot but no DHCP request for that particular
>>>>    mac is obtained on the controller. Is there anything I am missing that
>>>>    needs to be done in order to make ironic work?
>>>>
>>>> Note: I have also verified that the nic is PXE enabled in system
>>>> configuration setting
>>>>
>>>> Regards
>>>> Anirudh Gupta
>>>>
>>>>
>>>>
>>
>> --
>> Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>> Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael
>> O'Neill
>>
>

-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael
O'Neill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210803/837506f9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 38285 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210803/837506f9/attachment-0001.png>


More information about the openstack-discuss mailing list