Hi,

And thank you all for your help, I've managed to deploy my first overcloud.

But, again I have another problem. I am using HCI deployment and I did include ceph dashboard in my deployment script, but I didn't find the dashboard, after reviewing the RedHat documentation, it seems that I have to use this role "ControllerStorageDashboard". This is what I did, but I got this :

RESP BODY: {"resources": [{"updated_time": "2021-08-25T20:14:20Z", "creation_time": "2021-08-25T20:14:20Z", "logical_resource_id": "0", "resource_name": "0", "physical_resource_id": "a21b34
98-fbdb-4a19-8e23-9dd71232b473", "resource_status": "CREATE_FAILED", "resource_status_reason": "BadRequest: resources[0].resources.OVNMacAddressPort: Invalid input for operation: 'tripleo_o
vn_mac_port_name=ControllerStorageDashboard-ovn-mac-0' exceeds maximum length of 60.\nNeutron server returns request_ids: ['req-467b58ef-dfd7-42c5-bb07-4f0f99b77332']", "resource_type": "OS

::TripleO::OVNMacAddressPort", "links": [{"href": "https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/overcloud-ControllerStorageDashboard-vtmxtvxpzggi-1-ue2d2riknvna-Cont
rollerStorageDashboardOVNChassisMacPorts-ui4dsb2tnkbk/ae81eb26-2f4b-4ae0-8826-af32be18ce14/resources/0", "rel": "self"}, {"href": "https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c75
99a/stacks/overcloud-ControllerStorageDashboard-vtmxtvxpzggi-1-ue2d2riknvna-ControllerStorageDashboardOVNChassisMacPorts-ui4dsb2tnkbk/ae81eb26-2f4b-4ae0-8826-af32be18ce14", "rel": "stack"},
 {"href": "https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/overcloud-ControllerStorageDashboard-vtmxtvxpzggi-1-ue2d2riknvna-ControllerStorageDashboardOVNChassisMacPorts
-ui4dsb2tnkbk-0-yfuxj4ahxviu/a21b3498-fbdb-4a19-8e23-9dd71232b473", "rel": "nested"}], "required_by": [], "parent_resource": "ControllerStorageDashboardOVNChassisMacPorts"}]}
GET call to orchestration for https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/overcloud-ControllerStorageDashboard-vtmxtvxpzggi-1-ue2d2riknvna-ControllerStorageDashboar
dOVNChassisMacPorts-ui4dsb2tnkbk/ae81eb26-2f4b-4ae0-8826-af32be18ce14/resources used request id req-9609844f-f173-4e80-a3bd-bc287e88b00f
REQ: curl -g -i --cacert "/etc/pki/ca-trust/source/anchors/cm-local-ca.pem" -X GET https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/a21b3498-fbdb-4a19-8e23-9dd71232b473/
resources -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-heatclient" -H "X-Auth-Token: {SHA256}d296097c7cdf0beb50127e0a1d03cb8a702e18d543600f51b16d
ab4987811a6a" -H "X-Region-Name: "
https://10.200.24.2:13004 "GET /v1/4f94deb9a28549c0a78f232756c7599a/stacks/a21b3498-fbdb-4a19-8e23-9dd71232b473/resources HTTP/1.1" 302 649
RESP: [302] Content-Length: 649 Content-Type: application/json Date: Wed, 25 Aug 2021 20:15:11 GMT Location: https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/overcloud-C
ontrollerStorageDashboard-vtmxtvxpzggi-1-ue2d2rik
...
...
overcloud.ControllerStorageDashboard.0.ControllerStorageDashboardOVNChassisMacPorts.0.OVNMacAddressPort:
  resource_type: OS::Neutron::Port
  physical_resource_id: 259e39f8-9e7b-4494-bb2d-ff7b2cf0ad40
  status: CREATE_FAILED
  status_reason: |
    BadRequest: resources.OVNMacAddressPort: Invalid input for operation: 'tripleo_ovn_mac_port_name=ControllerStorageDashboard-ovn-mac-0' exceeds maximum length of 60.
    Neutron server returns request_ids: ['req-322ab0aa-0e1c-416f-be81-b48230d3dab1']
overcloud.ControllerStorageDashboard.2.ControllerStorageDashboardOVNChassisMacPorts.0.OVNMacAddressPort:
  resource_type: OS::Neutron::Port

  physical_resource_id: c7daf26b-7f96-43cf-8678-11d456b5cdfe
  status: CREATE_FAILED
  status_reason: |
    BadRequest: resources.OVNMacAddressPort: Invalid input for operation: 'tripleo_ovn_mac_port_name=ControllerStorageDashboard-ovn-mac-0' exceeds maximum length of 60.
    Neutron server returns request_ids: ['req-9e3e19dd-4974-4007-9df0-ee9774369495']
overcloud.ControllerStorageDashboard.1.ControllerStorageDashboardOVNChassisMacPorts.0.OVNMacAddressPort:
  resource_type: OS::Neutron::Port
  physical_resource_id: c902e259-f299-457f-8b0d-c37fb40e0d32
  status: CREATE_FAILED
  status_reason: |
    BadRequest: resources.OVNMacAddressPort: Invalid input for operation: 'tripleo_ovn_mac_port_name=ControllerStorageDashboard-ovn-mac-0' exceeds maximum length of 60.
    Neutron server returns request_ids: ['req-467b58ef-dfd7-42c5-bb07-4f0f99b77332']
clean_up ListStackFailures:
END return value: 0
Instantiating messaging websocket client: wss://10.200.24.2:3000

I couldn't find anything on the web about this error.


Regards

Le mar. 24 août 2021 à 22:25, wodel youchi <wodel.youchi@gmail.com> a écrit :
Hello,

After digging after grafana, it seems it needed to download something from the internet, and i didn't really configure a proper gateway on the external network.
So I started by configuring a proper gateway and I tested it with the half deployed nodes, the I redid the deployment, and again I got this error :

2021-08-24 21:29:29.616805 | 525400e8-92c8-d397-6f7e-000000006133 |      FATAL | Clean up legacy Cinder keystone catalog entries | undercloud | error={"changed": false, "module_stderr": "Fa
iled to discover available identity versions when contacting http://10.0.2.40:5000. Attempting to parse version from URL.\nTraceback (most recent call last):\n  File \"/usr/lib/python3.6/si
te-packages/urllib3/connection.py\", line 162, in _new_conn\n    (self._dns_host, self.port), self.timeout, **extra_kw)\n  File \"/usr/lib/python3.6/site-packages/urllib3/util/connection.py
\", line 80, in create_connection\n    raise err\n  File \"/usr/lib/python3.6/site-packages/urllib3/util/connection.py\", line 70, in create_connection\n    sock.connect(sa)\nTimeoutError:
[Errno 110] Connection timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.6/site-packages/urll
ib3/connectionpool.py\", line 600, in urlopen\n    chunked=chunked)\n  File \"/usr/lib/python3.6/site-packages/urllib3/connectionpool.py\", line 354, in _make_request\n    conn.request(meth
od, url, **httplib_request_kw)\n  File \"/usr/lib64/python3.6/http/client.py\", line 1269, in request\n    self._send_request(method, url, body, headers, encode_chunked)\n  File \"/usr/lib6
4/python3.6/http/client.py\", line 1315, in _send_request\n    self.endheaders(body, encode_chunked=encode_chunked)\n  File \"/usr/lib64/python3.6/http/client.py\", line 1264, in endheaders
\n    self._send_output(message_body, encode_chunked=encode_chunked)\n  File \"/usr/lib64/python3.6/http/client.py\", line 1040, in _send_output\n    self.send(msg)\n  File \"/usr/lib64/pyt
hon3.6/http/client.py\", line 978, in send\n    self.connect()\n  File \"/usr/lib/python3.6/site-packages/urllib3/connection.py\", line 184, in connect\n    conn = self._new_conn()\n  File
\"/usr/lib/python3.6/site-packages/urllib3/connection.py\", line 171, in _new_conn\n    self, \"Failed to establish a new connection: %s\" % e)\nurllib3.exceptions.NewConnectionError: <urll
ib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out\n\nDuring handling of the above exception, another exception
occurred:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.6/site-packages/requests/adapters.py\", line 449, in send\n    timeout=timeout\n  File \"/usr/lib/python3.6/site-p
ackages/urllib3/connectionpool.py\", line 638, in urlopen\n    _stacktrace=sys.exc_info()[2])\n  File \"/usr/lib/python3.6/site-packages/urllib3/util/retry.py\", line 399, in increment\n  
 raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='10.0.2.40', port=5000): Max retries exceeded with url: / (Caused
by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out',))\n\nDuring handling of the ab$
ve exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 997, in _send_request\n    resp $
 self.session.request(method, url, **kwargs)\n  File \"/usr/lib/python3.6/site-packages/requests/sessions.py\", line 533, in request\n    resp = self.send(prep, **send_kwargs)\n  File \"/u$
r/lib/python3.6/site-packages/requests/sessions.py\", line 646, in send\n    r = adapter.send(request, **kwargs)\n  File \"/usr/lib/python3.6/site-packages/requests/adapters.py\", line 516$
 in send\n    raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='10.0.2.40', port=5000): Max retries exceeded with url: / (Caused by N$wConnectionError('<urllib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out',))\n\nDuring handling of the above e$
ception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", line 138, in _do_create_plug$
n\n    authenticated=False)\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py\", line 610, in get_discovery\n    authenticated=authenticated)\n  File \"/usr/lib/pyt$
on3.6/site-packages/keystoneauth1/discover.py\", line 1442, in get_discovery\n    disc = Discover(session, url, authenticated=authenticated)\n  File \"/usr/lib/python3.6/site-packages/keys$
oneauth1/discover.py\", line 526, in __init__\n    authenticated=authenticated)\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/discover.py\", line 101, in get_version_data\n    r$
sp = session.get(url, headers=headers, authenticated=authenticated)\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 1116, in get\n    return self.request(url, '$
ET', **kwargs)\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 906, in request\n    resp = send(**kwargs)\n  File \"/usr/lib/python3.6/site-packages/keystoneaut$
1/session.py\", line 1013, in _send_request\n    raise exceptions.ConnectFailure(msg)\nkeystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.0.2.4$
:5000: HTTPConnectionPool(host='10.0.2.40', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed
to establish a new connection: [Errno 110] Connection timed out',))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"<$
tdin>\", line 102, in <module>\n  File \"<stdin>\", line 94, in _ansiballz_main\n  File \"<stdin>\", line 40, in invoke_module\n  File \"/usr/lib64/python3.6/runpy.py\", line 205, in run_m$
dule\n    return _run_module_code(code, init_globals, run_name, mod_spec)\n  File \"/usr/lib64/python3.6/runpy.py\", line 96, in _run_module_code\n    mod_name, mod_spec, pkg_name, script_$
ame)\n  File \"/usr/lib64/python3.6/runpy.py\", line 85, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/ansible_os_keystone_service_payload_wcyk6h37/ansible_os_keystone_service_p$
yload.zip/ansible/modules/cloud/openstack/os_keystone_service.py\", line 194, in <module>\n  File \"/tmp/ansible_os_keystone_service_payload_wcyk6h37/ansible_os_keystone_service_payload.zi$
/ansible/modules/cloud/openstack/os_keystone_service.py\", line 153, in main\n  File \"/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py\", line 510, in search_services\n    se$
vices = self.list_services()\n  File \"/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py\", line 485, in list_services\n    if self._is_client_version('identity', 2):\n  File \$
/usr/lib/python3.6/site-packages/openstack/cloud/openstackcloud.py\", line 459, in _is_client_version\n    client = getattr(self, client_name)\n  File \"/usr/lib/python3.6/site-packages/op$
nstack/cloud/_identity.py\", line 32, in _identity_client\n    'identity', min_version=2, max_version='3.latest')\n  File \"/usr/lib/python3.6/site-packages/openstack/cloud/openstackcloud.$
y\", line 406, in _get_versioned_client\n    if adapter.get_endpoint():\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py\", line 282, in get_endpoint\n    return self.se$
sion.get_endpoint(auth or self.auth, **kwargs)\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 1218, in get_endpoint\n    return auth.get_endpoint(self, **kwarg$
)\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py\", line 380, in get_endpoint\n    allow_version_hack=allow_version_hack, **kwargs)\n  File \"/usr/lib/python3.6/$
ite-packages/keystoneauth1/identity/base.py\", line 271, in get_endpoint_data\n    service_catalog = self.get_access(session).service_catalog\n  File \"/usr/lib/python3.6/site-packages/key$
toneauth1/identity/base.py\", line 134, in get_access\n    self.auth_ref = self.get_auth_ref(session)\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", l$
ne 206, in get_auth_ref\n    self._plugin = self._do_create_plugin(session)\n  File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", line 161, in _do_create_plu$
in\n    'auth_url is correct. %s' % e)\nkeystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that $our auth_url is correct. Unable to establish connection to http://10.0.2.40:5000: HTTPConnectionPool(host='10.0.2.40', port=5000): Max retries exceeded with url: / (Caused by NewConnectionE
rror('<urllib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out',))\n", "module_stdout": "", "msg": "MODULE FAILUR
E\nSee stdout/stderr for the exact error", "rc": 1}
                                                                                                                                         
2021-08-24 21:29:29.617697 | 525400e8-92c8-d397-6f7e-000000006133 |     TIMING | Clean up legacy Cinder keystone catalog entries | undercloud | 1:07:40.666419 | 130.85s                    
                                                                                                                                                                                             
PLAY RECAP *********************************************************************                                                                                                            
overcloud-computehci-0     : ok=260  changed=145  unreachable=0    failed=0    skipped=140  rescued=0    ignored=0                                                                          
overcloud-computehci-1     : ok=258  changed=145  unreachable=0    failed=0    skipped=140  rescued=0    ignored=0                                                                          
overcloud-computehci-2     : ok=255  changed=145  unreachable=0    failed=0    skipped=140  rescued=0    ignored=0                                                                          
overcloud-controller-0     : ok=295  changed=181  unreachable=0    failed=0    skipped=151  rescued=0    ignored=0                                                                          
overcloud-controller-1     : ok=289  changed=177  unreachable=0    failed=0    skipped=152  rescued=0    ignored=0                                                                          
overcloud-controller-2     : ok=288  changed=177  unreachable=0    failed=0    skipped=152  rescued=0    ignored=0                                                                          
undercloud                 : ok=105  changed=21   unreachable=0    failed=1    skipped=45   rescued=0    ignored=0                                                                          
                                                                                                                                                                                           
2021-08-24 21:29:29.730778 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                      
2021-08-24 21:29:29.731007 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1723       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                      
2021-08-24 21:29:29.731098 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 1:07:40.779840 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                      
2021-08-24 21:29:29.731172 |                                 UUID |       Info |       Host |   Task Name |   Run Time                                                                      
2021-08-24 21:29:29.731251 | 525400e8-92c8-d397-6f7e-000000003b9a |    SUMMARY | undercloud | Run tripleo-container-image-prepare logged to: /var/log/tripleo-container-image-prepare.log | 1762.93s                                                                                                                                                                                      
2021-08-24 21:29:29.731349 | 525400e8-92c8-d397-6f7e-0000000057aa |    SUMMARY | undercloud | tripleo-ceph-run-ansible : run ceph-ansible | 990.24s                                          
2021-08-24 21:29:29.731433 | 525400e8-92c8-d397-6f7e-000000005951 |    SUMMARY | overcloud-controller-0 | tripleo_ha_wrapper : Run init bundle puppet on the host for haproxy | 133.22s      
2021-08-24 21:29:29.731503 | 525400e8-92c8-d397-6f7e-000000006133 |    SUMMARY | undercloud | Clean up legacy Cinder keystone catalog entries | 130.85s                                      
2021-08-24 21:29:29.731569 | 525400e8-92c8-d397-6f7e-000000006012 |    SUMMARY | overcloud-controller-0 | Wait for containers to start for step 3 using paunch | 103.45s                    
2021-08-24 21:29:29.731643 | 525400e8-92c8-d397-6f7e-000000004337 |    SUMMARY | overcloud-computehci-0 | Pre-fetch all the containers | 94.00s                                              
2021-08-24 21:29:29.731729 | 525400e8-92c8-d397-6f7e-000000004378 |    SUMMARY | overcloud-computehci-2 | Pre-fetch all the containers | 92.64s                                              
2021-08-24 21:29:29.731795 | 525400e8-92c8-d397-6f7e-000000004337 |    SUMMARY | overcloud-computehci-1 | Pre-fetch all the containers | 86.38s                                              
2021-08-24 21:29:29.731867 | 525400e8-92c8-d397-6f7e-000000004d68 |    SUMMARY | overcloud-controller-0 | Wait for container-puppet tasks (generate config) to finish | 84.13s              
2021-08-24 21:29:29.731946 | 525400e8-92c8-d397-6f7e-000000004d99 |    SUMMARY | overcloud-controller-2 | Wait for container-puppet tasks (generate config) to finish | 80.76s              
2021-08-24 21:29:29.732012 | 525400e8-92c8-d397-6f7e-00000000427c |    SUMMARY | overcloud-controller-1 | Pre-fetch all the containers | 80.21s                                              
2021-08-24 21:29:29.732073 | 525400e8-92c8-d397-6f7e-00000000427c |    SUMMARY | overcloud-controller-0 | Pre-fetch all the containers | 77.03s                                              
2021-08-24 21:29:29.732138 | 525400e8-92c8-d397-6f7e-0000000042f5 |    SUMMARY | overcloud-controller-2 | Pre-fetch all the containers | 76.32s                                              
2021-08-24 21:29:29.732202 | 525400e8-92c8-d397-6f7e-000000004dd3 |    SUMMARY | overcloud-controller-1 | Wait for container-puppet tasks (generate config) to finish | 74.36s              
2021-08-24 21:29:29.732266 | 525400e8-92c8-d397-6f7e-000000005da7 |    SUMMARY | overcloud-controller-0 | tripleo_ha_wrapper : Run init bundle puppet on the host for ovn_dbs | 68.39s      
2021-08-24 21:29:29.732329 | 525400e8-92c8-d397-6f7e-000000005ce2 |    SUMMARY | overcloud-controller-0 | Wait for containers to start for step 2 using paunch | 64.55s                      
2021-08-24 21:29:29.732398 | 525400e8-92c8-d397-6f7e-000000004b97 |    SUMMARY | overcloud-controller-2 | Wait for puppet host configuration to finish | 58.13s                              
2021-08-24 21:29:29.732463 | 525400e8-92c8-d397-6f7e-000000004c1a |    SUMMARY | overcloud-controller-1 | Wait for puppet host configuration to finish | 58.11s                              
2021-08-24 21:29:29.732526 | 525400e8-92c8-d397-6f7e-000000005bd3 |    SUMMARY | overcloud-controller-1 | Wait for containers to start for step 2 using paunch | 58.09s                      
2021-08-24 21:29:29.732589 | 525400e8-92c8-d397-6f7e-000000005b9b |    SUMMARY | overcloud-controller-2 | Wait for containers to start for step 2 using paunch | 58.09s


Thank you again for your assistance.

Regards.

Le mar. 24 août 2021 à 08:59, wodel youchi <wodel.youchi@gmail.com> a écrit :
Hi, and thanks for your help

As for Ceph, here is  container prepare
parameter_defaults:
 ContainerImagePrepare:
 - push_destination: true
   set:
     ceph_alertmanager_image: alertmanager
     ceph_alertmanager_namespace: quay.ceph.io/prometheus
     ceph_alertmanager_tag: v0.16.2
     ceph_grafana_image: grafana
     ceph_grafana_namespace: quay.ceph.io/app-sre
     ceph_grafana_tag: 5.4.3
     ceph_image: daemon
     ceph_namespace: quay.ceph.io/ceph-ci
     ceph_node_exporter_image: node-exporter
     ceph_node_exporter_namespace: quay.ceph.io/prometheus
     ceph_node_exporter_tag: v0.17.0
     ceph_prometheus_image: prometheus
     ceph_prometheus_namespace: quay.ceph.io/prometheus
     ceph_prometheus_tag: v2.7.2
     ceph_tag: v4.0.19-stable-4.0-nautilus-centos-7-x86_64
     name_prefix: centos-binary-
     name_suffix: ''
     namespace: quay.io/tripleotraincentos8
     neutron_driver: ovn
     rhel_containers: false
     tag: current-tripleo
   tag_from_label: rdo_version

And yes, the 10.200.7.0/24 network is my storage network
Here is a snippet from my network_data.yaml

- name: Storage
 vip: true
 vlan: 1107
 name_lower: storage
 ip_subnet: '10.200.7.0/24'
 allocation_pools: [{'start': '10.200.7.150', 'end': '10.200.7.169'}]

I will look into the grafana service to see why it's not booting and get back to you.

Regards.

Le lun. 23 août 2021 à 17:28, Francesco Pantano <fpantano@redhat.com> a écrit :
Hello,
thanks John for your reply here.
A few more comments inline:

On Mon, Aug 23, 2021 at 6:16 PM John Fulton <johfulto@redhat.com> wrote:
On Mon, Aug 23, 2021 at 10:52 AM wodel youchi <wodel.youchi@gmail.com> wrote:
>
> Hi,
>
> I redid the undercloud deployment for the Train version for now. And I verified the download URL for the images.
> My overcloud deployment has moved forward but I still get errors.
>
> This is what I got this time :
>>
>>        "TASK [ceph-grafana : wait for grafana to start] ********************************",
>>        "Monday 23 August 2021  14:55:21 +0100 (0:00:00.961)       0:12:59.319 ********* ",
>>        "fatal: [overcloud-controller-0]: FAILED! => {\"changed\": false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20
>> 0.7.151:3100\"}",
>>        "fatal: [overcloud-controller-1]: FAILED! => {\"changed\": false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20
>> 0.7.155:3100\"}",
>>        "fatal: [overcloud-controller-2]: FAILED! => {\"changed\": false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20
>> 0.7.165:3100\"}",

I'm not certain of the ceph-ansible version you're using but it should
be a version 4 with train. ceph-ansible should already be installed on
your undercloud judging by this error and in the latest version 4 this
task is where it failed:

 https://github.com/ceph/ceph-ansible/blob/v4.0.64/roles/ceph-grafana/tasks/configure_grafana.yml#L112-L115

You can check the status of this service on your three controllers and
then debug it directly.
As John pointed out, ceph-ansible is able to configure, render and start the associated
systemd unit for all the ceph monitoring stack components (node-exported, prometheus, alertmanager and
grafana).
You can ssh to your controllers, and check the systemd unit associated, checking the journal to see why
they failed to start (I saw there's a timeout waiting for the container to start).
A potential plan, in this case, could be:

1. check the systemd unit (I guess you can start with grafana which is the failed service)
2. look at the journal logs (feel free to attach here the relevant part of the output)
3. double check the network where the service is bound (can you attach the /var/lib/mistral/<stack>/ceph-ansible/group_vars/all.yaml)
    * The grafana process should be run on the storage network, but I see a "
Timeout when waiting for 10.200.7.165:3100": is that network the right one?
 

  John

>>        "RUNNING HANDLER [ceph-prometheus : service handler] ****************************",
>>        "Monday 23 August 2021  15:00:22 +0100 (0:05:00.767)       0:18:00.087 ********* ",
>>        "PLAY RECAP *********************************************************************",
>>        "overcloud-computehci-0     : ok=224  changed=23   unreachable=0    failed=0    skipped=415  rescued=0    ignored=0   ",
>>        "overcloud-computehci-1     : ok=199  changed=18   unreachable=0    failed=0    skipped=392  rescued=0    ignored=0   ",
>>        "overcloud-computehci-2     : ok=212  changed=23   unreachable=0    failed=0    skipped=390  rescued=0    ignored=0   ",
>>        "overcloud-controller-0     : ok=370  changed=52   unreachable=0    failed=1    skipped=539  rescued=0    ignored=0   ",
>>        "overcloud-controller-1     : ok=308  changed=43   unreachable=0    failed=1    skipped=495  rescued=0    ignored=0   ",
>>        "overcloud-controller-2     : ok=317  changed=45   unreachable=0    failed=1    skipped=493  rescued=0    ignored=0   ",
>>
>>        "INSTALLER STATUS ***************************************************************",
>>        "Install Ceph Monitor           : Complete (0:00:52)",
>>        "Install Ceph Manager           : Complete (0:05:49)",
>>        "Install Ceph OSD               : Complete (0:02:28)",
>>        "Install Ceph RGW               : Complete (0:00:27)",
>>        "Install Ceph Client            : Complete (0:00:33)",
>>        "Install Ceph Grafana           : In Progress (0:05:54)",
>>        "\tThis phase can be restarted by running: roles/ceph-grafana/tasks/main.yml",
>>        "Install Ceph Node Exporter     : Complete (0:00:28)",
>>        "Monday 23 August 2021  15:00:22 +0100 (0:00:00.006)       0:18:00.094 ********* ",
>>        "=============================================================================== ",
>>        "ceph-grafana : wait for grafana to start ------------------------------ 300.77s",
>>        "ceph-facts : get ceph current status ---------------------------------- 300.27s",
>>        "ceph-container-common : pulling udtrain.ctlplane.umaitek.dz:8787/ceph-ci/daemon:v4.0.19-stable-4.0-nautilus-centos-7-x86_64
>> image -- 19.04s",
>>        "ceph-mon : waiting for the monitor(s) to form the quorum... ------------ 12.83s",
>>        "ceph-osd : use ceph-volume lvm batch to create bluestore osds ---------- 12.13s",
>>        "ceph-osd : wait for all osd to be up ----------------------------------- 11.88s",
>>        "ceph-osd : set pg_autoscale_mode value on pool(s) ---------------------- 11.00s",
>>        "ceph-osd : create openstack pool(s) ------------------------------------ 10.80s",
>>        "ceph-grafana : make sure grafana is down ------------------------------- 10.66s",
>>        "ceph-osd : customize pool crush_rule ----------------------------------- 10.15s",
>>        "ceph-osd : customize pool size ----------------------------------------- 10.15s",
>>        "ceph-osd : customize pool min_size ------------------------------------- 10.14s",
>>        "ceph-osd : assign application to pool(s) ------------------------------- 10.13s",
>>        "ceph-osd : list existing pool(s) ---------------------------------------- 8.59s",
>>
>>        "ceph-mon : fetch ceph initial keys -------------------------------------- 7.01s",
>>        "ceph-container-common : get ceph version -------------------------------- 6.75s",
>>        "ceph-prometheus : start prometheus services ----------------------------- 6.67s",
>>        "ceph-mgr : wait for all mgr to be up ------------------------------------ 6.66s",
>>        "ceph-grafana : start the grafana-server service ------------------------- 6.33s",
>>        "ceph-mgr : create ceph mgr keyring(s) on a mon node --------------------- 6.26s"
>>    ],
>>    "failed_when_result": true
>> }
>> 2021-08-23 15:00:24.427687 | 525400e8-92c8-47b1-e162-00000000597d |     TIMING | tripleo-ceph-run-ansible : print ceph-ansible outpu$
>> in case of failure | undercloud | 0:37:30.226345 | 0.25s
>>
>> PLAY RECAP *********************************************************************
>> overcloud-computehci-0     : ok=213  changed=117  unreachable=0    failed=0    skipped=120  rescued=0    ignored=0
>> overcloud-computehci-1     : ok=207  changed=117  unreachable=0    failed=0    skipped=120  rescued=0    ignored=0
>> overcloud-computehci-2     : ok=207  changed=117  unreachable=0    failed=0    skipped=120  rescued=0    ignored=0
>> overcloud-controller-0     : ok=237  changed=145  unreachable=0    failed=0    skipped=128  rescued=0    ignored=0
>> overcloud-controller-1     : ok=232  changed=145  unreachable=0    failed=0    skipped=128  rescued=0    ignored=0
>> overcloud-controller-2     : ok=232  changed=145  unreachable=0    failed=0    skipped=128  rescued=0    ignored=0
>> undercloud                 : ok=100  changed=18   unreachable=0    failed=1    skipped=37   rescued=0    ignored=0
>>
>> 2021-08-23 15:00:24.559997 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.560328 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1366       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.560419 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:37:30.359090 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.560490 |                                 UUID |       Info |       Host |   Task Name |   Run Time
>> 2021-08-23 15:00:24.560589 | 525400e8-92c8-47b1-e162-00000000597b |    SUMMARY | undercloud | tripleo-ceph-run-ansible : run ceph-ans
>> ible | 1082.71s
>> 2021-08-23 15:00:24.560675 | 525400e8-92c8-47b1-e162-000000004d9a |    SUMMARY | overcloud-controller-1 | Wait for container-puppet t
>> asks (generate config) to finish | 356.02s
>> 2021-08-23 15:00:24.560763 | 525400e8-92c8-47b1-e162-000000004d6a |    SUMMARY | overcloud-controller-0 | Wait for container-puppet t
>> asks (generate config) to finish | 355.74s
>> 2021-08-23 15:00:24.560839 | 525400e8-92c8-47b1-e162-000000004dd0 |    SUMMARY | overcloud-controller-2 | Wait for container-puppet t
>> asks (generate config) to finish | 355.68s
>> 2021-08-23 15:00:24.560912 | 525400e8-92c8-47b1-e162-000000003bb1 |    SUMMARY | undercloud | Run tripleo-container-image-prepare log
>> ged to: /var/log/tripleo-container-image-prepare.log | 143.03s
>> 2021-08-23 15:00:24.560986 | 525400e8-92c8-47b1-e162-000000004b13 |    SUMMARY | overcloud-controller-0 | Wait for puppet host config
>> uration to finish | 125.36s
>> 2021-08-23 15:00:24.561057 | 525400e8-92c8-47b1-e162-000000004b88 |    SUMMARY | overcloud-controller-2 | Wait for puppet host config
>> uration to finish | 125.33s
>> 2021-08-23 15:00:24.561128 | 525400e8-92c8-47b1-e162-000000004b4b |    SUMMARY | overcloud-controller-1 | Wait for puppet host config
>> uration to finish | 125.25s
>> 2021-08-23 15:00:24.561300 | 525400e8-92c8-47b1-e162-000000001dc4 |    SUMMARY | overcloud-controller-2 | Run puppet on the host to a
>> pply IPtables rules | 108.08s
>> 2021-08-23 15:00:24.561374 | 525400e8-92c8-47b1-e162-000000001e4f |    SUMMARY | overcloud-controller-0 | Run puppet on the host to a
>> pply IPtables rules | 107.34s
>> 2021-08-23 15:00:24.561444 | 525400e8-92c8-47b1-e162-000000004c8d |    SUMMARY | overcloud-computehci-2 | Wait for container-puppet t
>> asks (generate config) to finish | 96.56s
>> 2021-08-23 15:00:24.561514 | 525400e8-92c8-47b1-e162-000000004c33 |    SUMMARY | overcloud-computehci-0 | Wait for container-puppet t
>> asks (generate config) to finish | 96.38s
>> 2021-08-23 15:00:24.561580 | 525400e8-92c8-47b1-e162-000000004c60 |    SUMMARY | overcloud-computehci-1 | Wait for container-puppet t
>> asks (generate config) to finish | 93.41s
>> 2021-08-23 15:00:24.561645 | 525400e8-92c8-47b1-e162-00000000434d |    SUMMARY | overcloud-computehci-0 | Pre-fetch all the container
>> s | 92.70s
>> 2021-08-23 15:00:24.561712 | 525400e8-92c8-47b1-e162-0000000043ed |    SUMMARY | overcloud-computehci-2 | Pre-fetch all the container
>> s | 91.90s
>> 2021-08-23 15:00:24.561782 | 525400e8-92c8-47b1-e162-000000004385 |    SUMMARY | overcloud-computehci-1 | Pre-fetch all the container
>> s | 91.88s
>> 2021-08-23 15:00:24.561876 | 525400e8-92c8-47b1-e162-00000000491c |    SUMMARY | overcloud-computehci-1 | Wait for puppet host config
>> uration to finish | 90.37s
>> 2021-08-23 15:00:24.561947 | 525400e8-92c8-47b1-e162-000000004951 |    SUMMARY | overcloud-computehci-2 | Wait for puppet host config
>> uration to finish | 90.37s
>> 2021-08-23 15:00:24.562016 | 525400e8-92c8-47b1-e162-0000000048e6 |    SUMMARY | overcloud-computehci-0 | Wait for puppet host config
>> uration to finish | 90.35s
>> 2021-08-23 15:00:24.562080 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.562196 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.562311 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~
>> 2021-08-23 15:00:24.562379 |  The following node(s) had failures: undercloud
>> 2021-08-23 15:00:24.562456 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Host 10.0.2.40 not found in /home/stack/.ssh/known_hosts
>> Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.Overcloud Endpoint: http://10.0.2.40:5000
>> Overcloud Horizon Dashboard URL: http://10.0.2.40:80/dashboard
>> Overcloud rc file: /home/stack/overcloudrc
>> Overcloud Deployed with error
>> Overcloud configuration failed.
>>
>
>
> Could someone help debug this, the ansible.log is huge, I can't see what's the origin of the problem, if someone can point me to the right direction it will aprecciated.
> Thanks in advance.
>
> Regards.
>
> Le mer. 18 août 2021 à 18:02, Wesley Hayutin <whayutin@redhat.com> a écrit :
>>
>>
>>
>> On Wed, Aug 18, 2021 at 10:10 AM Dmitry Tantsur <dtantsur@redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> On Wed, Aug 18, 2021 at 4:39 PM wodel youchi <wodel.youchi@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> I am trying to deploy openstack with tripleO using VMs and nested-KVM for the compute node. This is for test and learning purposes.
>>>>
>>>> I am using the Train version and following some tutorials.
>>>> I prepared my different template files and started the deployment, but I got these errors :
>>>>
>>>> Failed to provision instance fc40457e-4b3c-4402-ae9d-c528f2c2ad30: Asynchronous exception: Node failed to deploy. Exception: Agent API for node 6d3724fc-6f13-4588-bbe5-56bc4f9a4f87 returned HTTP status code 404 with error: Not found: Extension with id iscsi not found. for node
>>>>
>>>
>>> You somehow ended up using master (Xena release) deploy ramdisk with Train TripleO. You need to make sure to download Train images. I hope TripleO people can point you at the right place.
>>>
>>> Dmitry
>>
>>
>> http://images.rdoproject.org/centos8/
>> http://images.rdoproject.org/centos8/train/rdo_trunk/current-tripleo/
>>
>>>
>>>
>>>>
>>>> and
>>>>
>>>> Got HTTP 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'CUSTOM_BAREMETAL' on resource provider '6d3724fc-6f13-4588-bbe5-56bc4f9a4f87'. The requested amount would exceed the capacity. ",
>>>>
>>>> Could you help understand what those errors mean? I couldn't find anything similar on the net.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Regards.
>>>
>>>
>>>
>>> --
>>> Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>> Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill




--
Francesco Pantano
GPG KEY: F41BD75C