Hello,

thanks John for your reply here.
A few more comments inline:

On Mon, Aug 23, 2021 at 6:16 PM John Fulton <johfulto@redhat.com> wrote:

On Mon, Aug 23, 2021 at 10:52 AM wodel youchi <wodel.youchi@gmail.com> wrote:
>
> Hi,
>
> I redid the undercloud deployment for the Train version for now. And I verified the download URL for the images.
> My overcloud deployment has moved forward but I still get errors.
>
> This is what I got this time :
>>
>> "TASK [ceph-grafana : wait for grafana to start] ********************************",
>> "Monday 23 August 2021 14:55:21 +0100 (0:00:00.961) 0:12:59.319 ********* ",
>> "fatal: [overcloud-controller-0]: FAILED! => {\"changed\": false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20
>> 0.7.151:3100\"}",
>> "fatal: [overcloud-controller-1]: FAILED! => {\"changed\": false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20
>> 0.7.155:3100\"}",
>> "fatal: [overcloud-controller-2]: FAILED! => {\"changed\": false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20
>> 0.7.165:3100\"}",

I'm not certain of the ceph-ansible version you're using but it should
be a version 4 with train. ceph-ansible should already be installed on
your undercloud judging by this error and in the latest version 4 this
task is where it failed:

https://github.com/ceph/ceph-ansible/blob/v4.0.64/roles/ceph-grafana/tasks/configure_grafana.yml#L112-L115

You can check the status of this service on your three controllers and
then debug it directly.

As John pointed out, ceph-ansible is able to configure, render and start the associated
systemd unit for all the ceph monitoring stack components (node-exported, prometheus, alertmanager and
grafana).
You can ssh to your controllers, and check the systemd unit associated, checking the journal to see why
they failed to start (I saw there's a timeout waiting for the container to start).
A potential plan, in this case, could be:

1. check the systemd unit (I guess you can start with grafana which is the failed service)
2. look at the journal logs (feel free to attach here the relevant part of the output)

3. double check the network where the service is bound (can you attach the /var/lib/mistral/<stack>/ceph-ansible/group_vars/all.yaml)
* The grafana process should be run on the storage network, but I see a "Timeout when waiting for 10.200.7.165:3100": is that network the right one?

John

>> "RUNNING HANDLER [ceph-prometheus : service handler] ****************************",
>> "Monday 23 August 2021 15:00:22 +0100 (0:05:00.767) 0:18:00.087 ********* ",
>> "PLAY RECAP *********************************************************************",
>> "overcloud-computehci-0 : ok=224 changed=23 unreachable=0 failed=0 skipped=415 rescued=0 ignored=0 ",
>> "overcloud-computehci-1 : ok=199 changed=18 unreachable=0 failed=0 skipped=392 rescued=0 ignored=0 ",
>> "overcloud-computehci-2 : ok=212 changed=23 unreachable=0 failed=0 skipped=390 rescued=0 ignored=0 ",
>> "overcloud-controller-0 : ok=370 changed=52 unreachable=0 failed=1 skipped=539 rescued=0 ignored=0 ",
>> "overcloud-controller-1 : ok=308 changed=43 unreachable=0 failed=1 skipped=495 rescued=0 ignored=0 ",
>> "overcloud-controller-2 : ok=317 changed=45 unreachable=0 failed=1 skipped=493 rescued=0 ignored=0 ",
>>
>> "INSTALLER STATUS ***************************************************************",
>> "Install Ceph Monitor : Complete (0:00:52)",
>> "Install Ceph Manager : Complete (0:05:49)",
>> "Install Ceph OSD : Complete (0:02:28)",
>> "Install Ceph RGW : Complete (0:00:27)",
>> "Install Ceph Client : Complete (0:00:33)",
>> "Install Ceph Grafana : In Progress (0:05:54)",
>> "\tThis phase can be restarted by running: roles/ceph-grafana/tasks/main.yml",
>> "Install Ceph Node Exporter : Complete (0:00:28)",
>> "Monday 23 August 2021 15:00:22 +0100 (0:00:00.006) 0:18:00.094 ********* ",
>> "=============================================================================== ",
>> "ceph-grafana : wait for grafana to start ------------------------------ 300.77s",
>> "ceph-facts : get ceph current status ---------------------------------- 300.27s",
>> "ceph-container-common : pulling udtrain.ctlplane.umaitek.dz:8787/ceph-ci/daemon:v4.0.19-stable-4.0-nautilus-centos-7-x86_64
>> image -- 19.04s",
>> "ceph-mon : waiting for the monitor(s) to form the quorum... ------------ 12.83s",
>> "ceph-osd : use ceph-volume lvm batch to create bluestore osds ---------- 12.13s",
>> "ceph-osd : wait for all osd to be up ----------------------------------- 11.88s",
>> "ceph-osd : set pg_autoscale_mode value on pool(s) ---------------------- 11.00s",
>> "ceph-osd : create openstack pool(s) ------------------------------------ 10.80s",
>> "ceph-grafana : make sure grafana is down ------------------------------- 10.66s",
>> "ceph-osd : customize pool crush_rule ----------------------------------- 10.15s",
>> "ceph-osd : customize pool size ----------------------------------------- 10.15s",
>> "ceph-osd : customize pool min_size ------------------------------------- 10.14s",
>> "ceph-osd : assign application to pool(s) ------------------------------- 10.13s",
>> "ceph-osd : list existing pool(s) ---------------------------------------- 8.59s",
>>
>> "ceph-mon : fetch ceph initial keys -------------------------------------- 7.01s",
>> "ceph-container-common : get ceph version -------------------------------- 6.75s",
>> "ceph-prometheus : start prometheus services ----------------------------- 6.67s",
>> "ceph-mgr : wait for all mgr to be up ------------------------------------ 6.66s",
>> "ceph-grafana : start the grafana-server service ------------------------- 6.33s",
>> "ceph-mgr : create ceph mgr keyring(s) on a mon node --------------------- 6.26s"
>> ],
>> "failed_when_result": true
>> }
>> 2021-08-23 15:00:24.427687 | 525400e8-92c8-47b1-e162-00000000597d | TIMING | tripleo-ceph-run-ansible : print ceph-ansible outpu$
>> in case of failure | undercloud | 0:37:30.226345 | 0.25s
>>
>> PLAY RECAP *********************************************************************
>> overcloud-computehci-0 : ok=213 changed=117 unreachable=0 failed=0 skipped=120 rescued=0 ignored=0
>> overcloud-computehci-1 : ok=207 changed=117 unreachable=0 failed=0 skipped=120 rescued=0 ignored=0
>> overcloud-computehci-2 : ok=207 changed=117 unreachable=0 failed=0 skipped=120 rescued=0 ignored=0
>> overcloud-controller-0 : ok=237 changed=145 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0
>> overcloud-controller-1 : ok=232 changed=145 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0
>> overcloud-controller-2 : ok=232 changed=145 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0
>> undercloud : ok=100 changed=18 unreachable=0 failed=1 skipped=37 rescued=0 ignored=0
>>
>> 2021-08-23 15:00:24.559997 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.560328 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1366 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.560419 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:37:30.359090 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.560490 | UUID | Info | Host | Task Name | Run Time
>> 2021-08-23 15:00:24.560589 | 525400e8-92c8-47b1-e162-00000000597b | SUMMARY | undercloud | tripleo-ceph-run-ansible : run ceph-ans
>> ible | 1082.71s
>> 2021-08-23 15:00:24.560675 | 525400e8-92c8-47b1-e162-000000004d9a | SUMMARY | overcloud-controller-1 | Wait for container-puppet t
>> asks (generate config) to finish | 356.02s
>> 2021-08-23 15:00:24.560763 | 525400e8-92c8-47b1-e162-000000004d6a | SUMMARY | overcloud-controller-0 | Wait for container-puppet t
>> asks (generate config) to finish | 355.74s
>> 2021-08-23 15:00:24.560839 | 525400e8-92c8-47b1-e162-000000004dd0 | SUMMARY | overcloud-controller-2 | Wait for container-puppet t
>> asks (generate config) to finish | 355.68s
>> 2021-08-23 15:00:24.560912 | 525400e8-92c8-47b1-e162-000000003bb1 | SUMMARY | undercloud | Run tripleo-container-image-prepare log
>> ged to: /var/log/tripleo-container-image-prepare.log | 143.03s
>> 2021-08-23 15:00:24.560986 | 525400e8-92c8-47b1-e162-000000004b13 | SUMMARY | overcloud-controller-0 | Wait for puppet host config
>> uration to finish | 125.36s
>> 2021-08-23 15:00:24.561057 | 525400e8-92c8-47b1-e162-000000004b88 | SUMMARY | overcloud-controller-2 | Wait for puppet host config
>> uration to finish | 125.33s
>> 2021-08-23 15:00:24.561128 | 525400e8-92c8-47b1-e162-000000004b4b | SUMMARY | overcloud-controller-1 | Wait for puppet host config
>> uration to finish | 125.25s
>> 2021-08-23 15:00:24.561300 | 525400e8-92c8-47b1-e162-000000001dc4 | SUMMARY | overcloud-controller-2 | Run puppet on the host to a
>> pply IPtables rules | 108.08s
>> 2021-08-23 15:00:24.561374 | 525400e8-92c8-47b1-e162-000000001e4f | SUMMARY | overcloud-controller-0 | Run puppet on the host to a
>> pply IPtables rules | 107.34s
>> 2021-08-23 15:00:24.561444 | 525400e8-92c8-47b1-e162-000000004c8d | SUMMARY | overcloud-computehci-2 | Wait for container-puppet t
>> asks (generate config) to finish | 96.56s
>> 2021-08-23 15:00:24.561514 | 525400e8-92c8-47b1-e162-000000004c33 | SUMMARY | overcloud-computehci-0 | Wait for container-puppet t
>> asks (generate config) to finish | 96.38s
>> 2021-08-23 15:00:24.561580 | 525400e8-92c8-47b1-e162-000000004c60 | SUMMARY | overcloud-computehci-1 | Wait for container-puppet t
>> asks (generate config) to finish | 93.41s
>> 2021-08-23 15:00:24.561645 | 525400e8-92c8-47b1-e162-00000000434d | SUMMARY | overcloud-computehci-0 | Pre-fetch all the container
>> s | 92.70s
>> 2021-08-23 15:00:24.561712 | 525400e8-92c8-47b1-e162-0000000043ed | SUMMARY | overcloud-computehci-2 | Pre-fetch all the container
>> s | 91.90s
>> 2021-08-23 15:00:24.561782 | 525400e8-92c8-47b1-e162-000000004385 | SUMMARY | overcloud-computehci-1 | Pre-fetch all the container
>> s | 91.88s
>> 2021-08-23 15:00:24.561876 | 525400e8-92c8-47b1-e162-00000000491c | SUMMARY | overcloud-computehci-1 | Wait for puppet host config
>> uration to finish | 90.37s
>> 2021-08-23 15:00:24.561947 | 525400e8-92c8-47b1-e162-000000004951 | SUMMARY | overcloud-computehci-2 | Wait for puppet host config
>> uration to finish | 90.37s
>> 2021-08-23 15:00:24.562016 | 525400e8-92c8-47b1-e162-0000000048e6 | SUMMARY | overcloud-computehci-0 | Wait for puppet host config
>> uration to finish | 90.35s
>> 2021-08-23 15:00:24.562080 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.562196 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.562311 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.562379 | The following node(s) had failures: undercloud
>> 2021-08-23 15:00:24.562456 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Host 10.0.2.40 not found in /home/stack/.ssh/known_hosts
>> Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.Overcloud Endpoint: http://10.0.2.40:5000
>> Overcloud Horizon Dashboard URL: http://10.0.2.40:80/dashboard
>> Overcloud rc file: /home/stack/overcloudrc
>> Overcloud Deployed with error
>> Overcloud configuration failed.
>>
>
>
> Could someone help debug this, the ansible.log is huge, I can't see what's the origin of the problem, if someone can point me to the right direction it will aprecciated.
> Thanks in advance.
>
> Regards.
>
> Le mer. 18 août 2021 à 18:02, Wesley Hayutin <whayutin@redhat.com> a écrit :
>>
>>
>>
>> On Wed, Aug 18, 2021 at 10:10 AM Dmitry Tantsur <dtantsur@redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> On Wed, Aug 18, 2021 at 4:39 PM wodel youchi <wodel.youchi@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> I am trying to deploy openstack with tripleO using VMs and nested-KVM for the compute node. This is for test and learning purposes.
>>>>
>>>> I am using the Train version and following some tutorials.
>>>> I prepared my different template files and started the deployment, but I got these errors :
>>>>
>>>> Failed to provision instance fc40457e-4b3c-4402-ae9d-c528f2c2ad30: Asynchronous exception: Node failed to deploy. Exception: Agent API for node 6d3724fc-6f13-4588-bbe5-56bc4f9a4f87 returned HTTP status code 404 with error: Not found: Extension with id iscsi not found. for node
>>>>
>>>
>>> You somehow ended up using master (Xena release) deploy ramdisk with Train TripleO. You need to make sure to download Train images. I hope TripleO people can point you at the right place.
>>>
>>> Dmitry
>>
>>
>> http://images.rdoproject.org/centos8/
>> http://images.rdoproject.org/centos8/train/rdo_trunk/current-tripleo/
>>
>>>
>>>
>>>>
>>>> and
>>>>
>>>> Got HTTP 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'CUSTOM_BAREMETAL' on resource provider '6d3724fc-6f13-4588-bbe5-56bc4f9a4f87'. The requested amount would exceed the capacity. ",
>>>>
>>>> Could you help understand what those errors mean? I couldn't find anything similar on the net.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Regards.
>>>
>>>
>>>
>>> --
>>> Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>> Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

Francesco Pantano
GPG KEY: F41BD75C