Hi, And thank you all for your help, I've managed to deploy my first overcloud. But, again I have another problem. I am using HCI deployment and I did include ceph dashboard in my deployment script, but I didn't find the dashboard, after reviewing the RedHat documentation, it seems that I have to use this role "ControllerStorageDashboard". This is what I did, but I got this :
*RESP BODY: {"resources": [{"updated_time": "2021-08-25T20:14:20Z", "creation_time": "2021-08-25T20:14:20Z", "logical_resource_id": "0", "resource_name": "0", "physical_resource_id": "a21b3498-fbdb-4a19-8e23-9dd71232b473", "resource_status": "CREATE_FAILED", "resource_status_reason": "BadRequest: resources[0].resources.OVNMacAddressPort: Invalid input for operation: 'tripleo_ovn_mac_port_name=ControllerStorageDashboard-ovn-mac-0' exceeds maximum length of 60.\nNeutron server returns request_ids: ['req-467b58ef-dfd7-42c5-bb07-4f0f99b77332']", "resource_type": "OS* ::TripleO::OVNMacAddressPort", "links": [{"href": " https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/overclo... rollerStorageDashboardOVNChassisMacPorts-ui4dsb2tnkbk/ae81eb26-2f4b-4ae0-8826-af32be18ce14/resources/0", "rel": "self"}, {"href": " https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c75 99a/stacks/overcloud-ControllerStorageDashboard-vtmxtvxpzggi-1-ue2d2riknvna-ControllerStorageDashboardOVNChassisMacPorts-ui4dsb2tnkbk/ae81eb26-2f4b-4ae0-8826-af32be18ce14", "rel": "stack"}, {"href": " https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/overclo... -ui4dsb2tnkbk-0-yfuxj4ahxviu/a21b3498-fbdb-4a19-8e23-9dd71232b473", "rel": "nested"}], "required_by": [], "parent_resource": "ControllerStorageDashboardOVNChassisMacPorts"}]} GET call to orchestration for https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/overclo... dOVNChassisMacPorts-ui4dsb2tnkbk/ae81eb26-2f4b-4ae0-8826-af32be18ce14/resources used request id req-9609844f-f173-4e80-a3bd-bc287e88b00f REQ: curl -g -i --cacert "/etc/pki/ca-trust/source/anchors/cm-local-ca.pem" -X GET https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/a21b349... resources -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-heatclient" -H "X-Auth-Token: {SHA256}d296097c7cdf0beb50127e0a1d03cb8a702e18d543600f51b16d ab4987811a6a" -H "X-Region-Name: " https://10.200.24.2:13004 "GET /v1/4f94deb9a28549c0a78f232756c7599a/stacks/a21b3498-fbdb-4a19-8e23-9dd71232b473/resources HTTP/1.1" 302 649 RESP: [302] Content-Length: 649 Content-Type: application/json Date: Wed, 25 Aug 2021 20:15:11 GMT Location: https://10.200.24.2:13004/v1/4f94deb9a28549c0a78f232756c7599a/stacks/overclo... ontrollerStorageDashboard-vtmxtvxpzggi-1-ue2d2rik
...
...
overcloud.ControllerStorageDashboard.0.ControllerStorageDashboardOVNChassisMacPorts.0.OVNMacAddressPort:
resource_type: OS::Neutron::Port physical_resource_id: 259e39f8-9e7b-4494-bb2d-ff7b2cf0ad40 status: CREATE_FAILED status_reason: |
*BadRequest: resources.OVNMacAddressPort: Invalid input for operation: 'tripleo_ovn_mac_port_name=ControllerStorageDashboard-ovn-mac-0' exceeds maximum length of 60. Neutron server returns request_ids: ['req-322ab0aa-0e1c-416f-be81-b48230d3dab1']overcloud.ControllerStorageDashboard.2.ControllerStorageDashboardOVNChassisMacPorts.0.OVNMacAddressPort: resource_type: OS::Neutron::Port* physical_resource_id: c7daf26b-7f96-43cf-8678-11d456b5cdfe status: CREATE_FAILED status_reason: | BadRequest: resources.OVNMacAddressPort: Invalid input for operation: 'tripleo_ovn_mac_port_name=ControllerStorageDashboard-ovn-mac-0' exceeds maximum length of 60. Neutron server returns request_ids: ['req-9e3e19dd-4974-4007-9df0-ee9774369495']
overcloud.ControllerStorageDashboard.1.ControllerStorageDashboardOVNChassisMacPorts.0.OVNMacAddressPort: resource_type: OS::Neutron::Port physical_resource_id: c902e259-f299-457f-8b0d-c37fb40e0d32 status: CREATE_FAILED status_reason: | BadRequest: resources.OVNMacAddressPort: Invalid input for operation: 'tripleo_ovn_mac_port_name=ControllerStorageDashboard-ovn-mac-0' exceeds maximum length of 60. Neutron server returns request_ids: ['req-467b58ef-dfd7-42c5-bb07-4f0f99b77332'] clean_up ListStackFailures: END return value: 0 Instantiating messaging websocket client: wss://10.200.24.2:3000
I couldn't find anything on the web about this error. Regards Le mar. 24 août 2021 à 22:25, wodel youchi <wodel.youchi@gmail.com> a écrit :
Hello,
After digging after grafana, it seems it needed to download something from the internet, and i didn't really configure a proper gateway on the external network. So I started by configuring a proper gateway and I tested it with the half deployed nodes, the I redid the deployment, and again I got this error :
2021-08-24 21:29:29.616805 | 525400e8-92c8-d397-6f7e-000000006133 |
FATAL | Clean up legacy Cinder keystone catalog entries | undercloud | error={"changed": false, "module_stderr": "Fa iled to discover available identity versions when contacting http://10.0.2.40:5000. Attempting to parse version from URL.\nTraceback (most recent call last):\n File \"/usr/lib/python3.6/si te-packages/urllib3/connection.py\", line 162, in _new_conn\n (self._dns_host, self.port), self.timeout, **extra_kw)\n File \"/usr/lib/python3.6/site-packages/urllib3/util/connection.py \", line 80, in create_connection\n raise err\n File \"/usr/lib/python3.6/site-packages/urllib3/util/connection.py\", line 70, in create_connection\n sock.connect(sa)\nTimeoutError: [Errno 110] Connection timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.6/site-packages/urll ib3/connectionpool.py\", line 600, in urlopen\n chunked=chunked)\n File \"/usr/lib/python3.6/site-packages/urllib3/connectionpool.py\", line 354, in _make_request\n conn.request(meth od, url, **httplib_request_kw)\n File \"/usr/lib64/python3.6/http/client.py\", line 1269, in request\n self._send_request(method, url, body, headers, encode_chunked)\n File \"/usr/lib6 4/python3.6/http/client.py\", line 1315, in _send_request\n self.endheaders(body, encode_chunked=encode_chunked)\n File \"/usr/lib64/python3.6/http/client.py\", line 1264, in endheaders \n self._send_output(message_body, encode_chunked=encode_chunked)\n File \"/usr/lib64/python3.6/http/client.py\", line 1040, in _send_output\n self.send(msg)\n File \"/usr/lib64/pyt hon3.6/http/client.py\", line 978, in send\n self.connect()\n File \"/usr/lib/python3.6/site-packages/urllib3/connection.py\", line 184, in connect\n conn = self._new_conn()\n File \"/usr/lib/python3.6/site-packages/urllib3/connection.py\", line 171, in _new_conn\n self, \"Failed to establish a new connection: %s\" % e)\nurllib3.exceptions.NewConnectionError: <urll ib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.6/site-packages/requests/adapters.py\", line 449, in send\n timeout=timeout\n File \"/usr/lib/python3.6/site-p ackages/urllib3/connectionpool.py\", line 638, in urlopen\n _stacktrace=sys.exc_info()[2])\n File \"/usr/lib/python3.6/site-packages/urllib3/util/retry.py\", line 399, in increment\n raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='10.0.2.40', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out',))\n\nDuring handling of the ab$ ve exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 997, in _send_request\n resp $ self.session.request(method, url, **kwargs)\n File \"/usr/lib/python3.6/site-packages/requests/sessions.py\", line 533, in request\n resp = self.send(prep, **send_kwargs)\n File \"/u$ r/lib/python3.6/site-packages/requests/sessions.py\", line 646, in send\n r = adapter.send(request, **kwargs)\n File \"/usr/lib/python3.6/site-packages/requests/adapters.py\", line 516$ in send\n raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='10.0.2.40', port=5000): Max retries exceeded with url: / (Caused by N$wConnectionError('<urllib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out',))\n\nDuring handling of the above e$ ception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", line 138, in _do_create_plug$ n\n authenticated=False)\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py\", line 610, in get_discovery\n authenticated=authenticated)\n File \"/usr/lib/pyt$ on3.6/site-packages/keystoneauth1/discover.py\", line 1442, in get_discovery\n disc = Discover(session, url, authenticated=authenticated)\n File \"/usr/lib/python3.6/site-packages/keys$ oneauth1/discover.py\", line 526, in __init__\n authenticated=authenticated)\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/discover.py\", line 101, in get_version_data\n r$ sp = session.get(url, headers=headers, authenticated=authenticated)\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 1116, in get\n return self.request(url, '$ ET', **kwargs)\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 906, in request\n resp = send(**kwargs)\n File \"/usr/lib/python3.6/site-packages/keystoneaut$ 1/session.py\", line 1013, in _send_request\n raise exceptions.ConnectFailure(msg)\nkeystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.0.2.4$ :5000: HTTPConnectionPool(host='10.0.2.40', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out',))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"<$ tdin>\", line 102, in <module>\n File \"<stdin>\", line 94, in _ansiballz_main\n File \"<stdin>\", line 40, in invoke_module\n File \"/usr/lib64/python3.6/runpy.py\", line 205, in run_m$ dule\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/usr/lib64/python3.6/runpy.py\", line 96, in _run_module_code\n mod_name, mod_spec, pkg_name, script_$ ame)\n File \"/usr/lib64/python3.6/runpy.py\", line 85, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_os_keystone_service_payload_wcyk6h37/ansible_os_keystone_service_p$ yload.zip/ansible/modules/cloud/openstack/os_keystone_service.py\", line 194, in <module>\n File \"/tmp/ansible_os_keystone_service_payload_wcyk6h37/ansible_os_keystone_service_payload.zi$ /ansible/modules/cloud/openstack/os_keystone_service.py\", line 153, in main\n File \"/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py\", line 510, in search_services\n se$ vices = self.list_services()\n File \"/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py\", line 485, in list_services\n if self._is_client_version('identity', 2):\n File \$ /usr/lib/python3.6/site-packages/openstack/cloud/openstackcloud.py\", line 459, in _is_client_version\n client = getattr(self, client_name)\n File \"/usr/lib/python3.6/site-packages/op$ nstack/cloud/_identity.py\", line 32, in _identity_client\n 'identity', min_version=2, max_version='3.latest')\n File \"/usr/lib/python3.6/site-packages/openstack/cloud/openstackcloud.$ y\", line 406, in _get_versioned_client\n if adapter.get_endpoint():\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py\", line 282, in get_endpoint\n return self.se$ sion.get_endpoint(auth or self.auth, **kwargs)\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/session.py\", line 1218, in get_endpoint\n return auth.get_endpoint(self, **kwarg$ )\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/base.py\", line 380, in get_endpoint\n allow_version_hack=allow_version_hack, **kwargs)\n File \"/usr/lib/python3.6/$ ite-packages/keystoneauth1/identity/base.py\", line 271, in get_endpoint_data\n service_catalog = self.get_access(session).service_catalog\n File \"/usr/lib/python3.6/site-packages/key$ toneauth1/identity/base.py\", line 134, in get_access\n self.auth_ref = self.get_auth_ref(session)\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", l$ ne 206, in get_auth_ref\n self._plugin = self._do_create_plugin(session)\n File \"/usr/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py\", line 161, in _do_create_plu$ in\n 'auth_url is correct. %s' % e)\nkeystoneauth1.exceptions.discovery.DiscoveryFailure: Could not find versioned identity endpoints when attempting to authenticate. Please check that $our auth_url is correct.
*Unable to establish connection to http://10.0.2.40:5000 <http://10.0.2.40:5000>: HTTPConnectionPool(host='10.0.2.40', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f96f7b10cc0>: Failed to establish a new connection: [Errno 110] Connection timed out',))\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1} *
2021-08-24 21:29:29.617697 | 525400e8-92c8-d397-6f7e-000000006133 | TIMING | Clean up legacy Cinder keystone catalog entries | undercloud | 1:07:40.666419 | 130.85s
PLAY RECAP *********************************************************************
overcloud-computehci-0 : ok=260 changed=145 unreachable=0 failed=0 skipped=140 rescued=0 ignored=0
overcloud-computehci-1 : ok=258 changed=145 unreachable=0 failed=0 skipped=140 rescued=0 ignored=0
overcloud-computehci-2 : ok=255 changed=145 unreachable=0 failed=0 skipped=140 rescued=0 ignored=0
overcloud-controller-0 : ok=295 changed=181 unreachable=0 failed=0 skipped=151 rescued=0 ignored=0
overcloud-controller-1 : ok=289 changed=177 unreachable=0 failed=0 skipped=152 rescued=0 ignored=0
overcloud-controller-2 : ok=288 changed=177 unreachable=0 failed=0 skipped=152 rescued=0 ignored=0
undercloud : ok=105 changed=21 unreachable=0 failed=1 skipped=45 rescued=0 ignored=0
2021-08-24 21:29:29.730778 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-08-24 21:29:29.731007 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1723 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-08-24 21:29:29.731098 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 1:07:40.779840 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-08-24 21:29:29.731172 | UUID | Info | Host | Task Name | Run Time
2021-08-24 21:29:29.731251 | 525400e8-92c8-d397-6f7e-000000003b9a | SUMMARY | undercloud | Run tripleo-container-image-prepare logged to: /var/log/tripleo-container-image-prepare.log | 1762.93s
2021-08-24 21:29:29.731349 | 525400e8-92c8-d397-6f7e-0000000057aa | SUMMARY | undercloud | tripleo-ceph-run-ansible : run ceph-ansible | 990.24s 2021-08-24 21:29:29.731433 | 525400e8-92c8-d397-6f7e-000000005951 | SUMMARY | overcloud-controller-0 | tripleo_ha_wrapper : Run init bundle puppet on the host for haproxy | 133.22s 2021-08-24 21:29:29.731503 | 525400e8-92c8-d397-6f7e-000000006133 | SUMMARY | undercloud | Clean up legacy Cinder keystone catalog entries | 130.85s 2021-08-24 21:29:29.731569 | 525400e8-92c8-d397-6f7e-000000006012 | SUMMARY | overcloud-controller-0 | Wait for containers to start for step 3 using paunch | 103.45s 2021-08-24 21:29:29.731643 | 525400e8-92c8-d397-6f7e-000000004337 | SUMMARY | overcloud-computehci-0 | Pre-fetch all the containers | 94.00s
2021-08-24 21:29:29.731729 | 525400e8-92c8-d397-6f7e-000000004378 | SUMMARY | overcloud-computehci-2 | Pre-fetch all the containers | 92.64s
2021-08-24 21:29:29.731795 | 525400e8-92c8-d397-6f7e-000000004337 | SUMMARY | overcloud-computehci-1 | Pre-fetch all the containers | 86.38s
2021-08-24 21:29:29.731867 | 525400e8-92c8-d397-6f7e-000000004d68 | SUMMARY | overcloud-controller-0 | Wait for container-puppet tasks (generate config) to finish | 84.13s 2021-08-24 21:29:29.731946 | 525400e8-92c8-d397-6f7e-000000004d99 | SUMMARY | overcloud-controller-2 | Wait for container-puppet tasks (generate config) to finish | 80.76s 2021-08-24 21:29:29.732012 | 525400e8-92c8-d397-6f7e-00000000427c | SUMMARY | overcloud-controller-1 | Pre-fetch all the containers | 80.21s
2021-08-24 21:29:29.732073 | 525400e8-92c8-d397-6f7e-00000000427c | SUMMARY | overcloud-controller-0 | Pre-fetch all the containers | 77.03s
2021-08-24 21:29:29.732138 | 525400e8-92c8-d397-6f7e-0000000042f5 | SUMMARY | overcloud-controller-2 | Pre-fetch all the containers | 76.32s
2021-08-24 21:29:29.732202 | 525400e8-92c8-d397-6f7e-000000004dd3 | SUMMARY | overcloud-controller-1 | Wait for container-puppet tasks (generate config) to finish | 74.36s 2021-08-24 21:29:29.732266 | 525400e8-92c8-d397-6f7e-000000005da7 | SUMMARY | overcloud-controller-0 | tripleo_ha_wrapper : Run init bundle puppet on the host for ovn_dbs | 68.39s 2021-08-24 21:29:29.732329 | 525400e8-92c8-d397-6f7e-000000005ce2 | SUMMARY | overcloud-controller-0 | Wait for containers to start for step 2 using paunch | 64.55s 2021-08-24 21:29:29.732398 | 525400e8-92c8-d397-6f7e-000000004b97 | SUMMARY | overcloud-controller-2 | Wait for puppet host configuration to finish | 58.13s 2021-08-24 21:29:29.732463 | 525400e8-92c8-d397-6f7e-000000004c1a | SUMMARY | overcloud-controller-1 | Wait for puppet host configuration to finish | 58.11s 2021-08-24 21:29:29.732526 | 525400e8-92c8-d397-6f7e-000000005bd3 | SUMMARY | overcloud-controller-1 | Wait for containers to start for step 2 using paunch | 58.09s 2021-08-24 21:29:29.732589 | 525400e8-92c8-d397-6f7e-000000005b9b | SUMMARY | overcloud-controller-2 | Wait for containers to start for step 2 using paunch | 58.09s
Thank you again for your assistance.
Regards.
Le mar. 24 août 2021 à 08:59, wodel youchi <wodel.youchi@gmail.com> a écrit :
Hi, and thanks for your help
As for Ceph, here is container prepare parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_alertmanager_image: alertmanager ceph_alertmanager_namespace: quay.ceph.io/prometheus ceph_alertmanager_tag: v0.16.2 ceph_grafana_image: grafana ceph_grafana_namespace: quay.ceph.io/app-sre *ceph_grafana_tag: 5.4.3* ceph_image: daemon ceph_namespace: quay.ceph.io/ceph-ci ceph_node_exporter_image: node-exporter ceph_node_exporter_namespace: quay.ceph.io/prometheus ceph_node_exporter_tag: v0.17.0 ceph_prometheus_image: prometheus ceph_prometheus_namespace: quay.ceph.io/prometheus ceph_prometheus_tag: v2.7.2 *ceph_tag: v4.0.19-stable-4.0-nautilus-centos-7-x86_64* name_prefix: centos-binary- name_suffix: '' namespace: quay.io/tripleotraincentos8 neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version
And yes, the 10.200.7.0/24 network is my storage network Here is a snippet from my network_data.yaml
- name: Storage vip: true vlan: 1107 name_lower: storage ip_subnet: '10.200.7.0/24' allocation_pools: [{'start': '10.200.7.150', 'end': '10.200.7.169'}]
I will look into the grafana service to see why it's not booting and get back to you.
Regards.
Le lun. 23 août 2021 à 17:28, Francesco Pantano <fpantano@redhat.com> a écrit :
Hello, thanks John for your reply here. A few more comments inline:
On Mon, Aug 23, 2021 at 6:16 PM John Fulton <johfulto@redhat.com> wrote:
On Mon, Aug 23, 2021 at 10:52 AM wodel youchi <wodel.youchi@gmail.com> wrote:
Hi,
I redid the undercloud deployment for the Train version for now. And
I verified the download URL for the images.
My overcloud deployment has moved forward but I still get errors.
This is what I got this time :
"TASK [ceph-grafana : wait for grafana to start]
********************************",
"Monday 23 August 2021 14:55:21 +0100 (0:00:00.961)
0:12:59.319 ********* ",
"fatal: [overcloud-controller-0]: FAILED! => {\"changed\":
false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20
0.7.151:3100\"}", "fatal: [overcloud-controller-1]: FAILED! => {\"changed\": false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20 0.7.155:3100\"}", "fatal: [overcloud-controller-2]: FAILED! => {\"changed\": false, \"elapsed\": 300, \"msg\": \"Timeout when waiting for 10.20 0.7.165:3100\"}",
I'm not certain of the ceph-ansible version you're using but it should be a version 4 with train. ceph-ansible should already be installed on your undercloud judging by this error and in the latest version 4 this task is where it failed:
https://github.com/ceph/ceph-ansible/blob/v4.0.64/roles/ceph-grafana/tasks/c...
You can check the status of this service on your three controllers and then debug it directly.
As John pointed out, ceph-ansible is able to configure, render and start the associated systemd unit for all the ceph monitoring stack components (node-exported, prometheus, alertmanager and grafana). You can ssh to your controllers, and check the systemd unit associated, checking the journal to see why they failed to start (I saw there's a timeout waiting for the container to start). A potential plan, in this case, could be:
1. check the systemd unit (I guess you can start with grafana which is the failed service) 2. look at the journal logs (feel free to attach here the relevant part of the output) 3. double check the network where the service is bound (can you attach the /var/lib/mistral/<stack>/ceph-ansible/group_vars/all.yaml) * The grafana process should be run on the storage network, but I see a "Timeout when waiting for 10.200.7.165:3100": is that network the right one?
John
"RUNNING HANDLER [ceph-prometheus : service handler]
****************************",
"Monday 23 August 2021 15:00:22 +0100 (0:05:00.767)
0:18:00.087 ********* ",
"PLAY RECAP
*********************************************************************",
"overcloud-computehci-0 : ok=224 changed=23
unreachable=0 failed=0 skipped=415 rescued=0 ignored=0 ",
"overcloud-computehci-1 : ok=199 changed=18
unreachable=0 failed=0 skipped=392 rescued=0 ignored=0 ",
"overcloud-computehci-2 : ok=212 changed=23
unreachable=0 failed=0 skipped=390 rescued=0 ignored=0 ",
"overcloud-controller-0 : ok=370 changed=52
unreachable=0 failed=1 skipped=539 rescued=0 ignored=0 ",
"overcloud-controller-1 : ok=308 changed=43
unreachable=0 failed=1 skipped=495 rescued=0 ignored=0 ",
"overcloud-controller-2 : ok=317 changed=45
unreachable=0 failed=1 skipped=493 rescued=0 ignored=0 ",
"INSTALLER STATUS
***************************************************************",
"Install Ceph Monitor : Complete (0:00:52)", "Install Ceph Manager : Complete (0:05:49)", "Install Ceph OSD : Complete (0:02:28)", "Install Ceph RGW : Complete (0:00:27)", "Install Ceph Client : Complete (0:00:33)", "Install Ceph Grafana : In Progress (0:05:54)", "\tThis phase can be restarted by running:
roles/ceph-grafana/tasks/main.yml",
"Install Ceph Node Exporter : Complete (0:00:28)", "Monday 23 August 2021 15:00:22 +0100 (0:00:00.006)
0:18:00.094 ********* ",
"=============================================================================== ",
"ceph-grafana : wait for grafana to start
------------------------------ 300.77s",
"ceph-facts : get ceph current status
---------------------------------- 300.27s",
"ceph-container-common : pulling
udtrain.ctlplane.umaitek.dz:8787/ceph-ci/daemon:v4.0.19-stable-4.0-nautilus-centos-7-x86_64
image -- 19.04s", "ceph-mon : waiting for the monitor(s) to form the quorum... ------------ 12.83s", "ceph-osd : use ceph-volume lvm batch to create bluestore osds ---------- 12.13s", "ceph-osd : wait for all osd to be up ----------------------------------- 11.88s", "ceph-osd : set pg_autoscale_mode value on pool(s) ---------------------- 11.00s", "ceph-osd : create openstack pool(s) ------------------------------------ 10.80s", "ceph-grafana : make sure grafana is down ------------------------------- 10.66s", "ceph-osd : customize pool crush_rule ----------------------------------- 10.15s", "ceph-osd : customize pool size ----------------------------------------- 10.15s", "ceph-osd : customize pool min_size ------------------------------------- 10.14s", "ceph-osd : assign application to pool(s) ------------------------------- 10.13s", "ceph-osd : list existing pool(s) ---------------------------------------- 8.59s",
"ceph-mon : fetch ceph initial keys -------------------------------------- 7.01s", "ceph-container-common : get ceph version -------------------------------- 6.75s", "ceph-prometheus : start prometheus services ----------------------------- 6.67s", "ceph-mgr : wait for all mgr to be up ------------------------------------ 6.66s", "ceph-grafana : start the grafana-server service ------------------------- 6.33s", "ceph-mgr : create ceph mgr keyring(s) on a mon node --------------------- 6.26s" ], "failed_when_result": true } 2021-08-23 15:00:24.427687 | 525400e8-92c8-47b1-e162-00000000597d | TIMING | tripleo-ceph-run-ansible : print ceph-ansible outpu$ in case of failure | undercloud | 0:37:30.226345 | 0.25s
PLAY RECAP
overcloud-computehci-0 : ok=213 changed=117 unreachable=0 failed=0 skipped=120 rescued=0 ignored=0 overcloud-computehci-1 : ok=207 changed=117 unreachable=0 failed=0 skipped=120 rescued=0 ignored=0 overcloud-computehci-2 : ok=207 changed=117 unreachable=0 failed=0 skipped=120 rescued=0 ignored=0 overcloud-controller-0 : ok=237 changed=145 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0 overcloud-controller-1 : ok=232 changed=145 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0 overcloud-controller-2 : ok=232 changed=145 unreachable=0 failed=0 skipped=128 rescued=0 ignored=0 undercloud : ok=100 changed=18 unreachable=0 failed=1 skipped=37 rescued=0 ignored=0
2021-08-23 15:00:24.559997 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2021-08-23 15:00:24.560328 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 1366 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2021-08-23 15:00:24.560419 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:37:30.359090 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2021-08-23 15:00:24.560490 | UUID | Info | Host | Task Name | Run Time 2021-08-23 15:00:24.560589 | 525400e8-92c8-47b1-e162-00000000597b | SUMMARY | undercloud | tripleo-ceph-run-ansible : run ceph-ans ible | 1082.71s 2021-08-23 15:00:24.560675 | 525400e8-92c8-47b1-e162-000000004d9a | SUMMARY | overcloud-controller-1 | Wait for container-puppet t asks (generate config) to finish | 356.02s 2021-08-23 15:00:24.560763 | 525400e8-92c8-47b1-e162-000000004d6a | SUMMARY | overcloud-controller-0 | Wait for container-puppet t asks (generate config) to finish | 355.74s 2021-08-23 15:00:24.560839 | 525400e8-92c8-47b1-e162-000000004dd0 | SUMMARY | overcloud-controller-2 | Wait for container-puppet t asks (generate config) to finish | 355.68s 2021-08-23 15:00:24.560912 | 525400e8-92c8-47b1-e162-000000003bb1 | SUMMARY | undercloud | Run tripleo-container-image-prepare log ged to: /var/log/tripleo-container-image-prepare.log | 143.03s 2021-08-23 15:00:24.560986 | 525400e8-92c8-47b1-e162-000000004b13 | SUMMARY | overcloud-controller-0 | Wait for puppet host config uration to finish | 125.36s 2021-08-23 15:00:24.561057 | 525400e8-92c8-47b1-e162-000000004b88 | SUMMARY | overcloud-controller-2 | Wait for puppet host config uration to finish | 125.33s 2021-08-23 15:00:24.561128 | 525400e8-92c8-47b1-e162-000000004b4b | SUMMARY | overcloud-controller-1 | Wait for puppet host config uration to finish | 125.25s 2021-08-23 15:00:24.561300 | 525400e8-92c8-47b1-e162-000000001dc4 | SUMMARY | overcloud-controller-2 | Run puppet on the host to a pply IPtables rules | 108.08s 2021-08-23 15:00:24.561374 | 525400e8-92c8-47b1-e162-000000001e4f | SUMMARY | overcloud-controller-0 | Run puppet on the host to a pply IPtables rules | 107.34s 2021-08-23 15:00:24.561444 | 525400e8-92c8-47b1-e162-000000004c8d | SUMMARY | overcloud-computehci-2 | Wait for container-puppet t asks (generate config) to finish | 96.56s 2021-08-23 15:00:24.561514 | 525400e8-92c8-47b1-e162-000000004c33 | SUMMARY | overcloud-computehci-0 | Wait for container-puppet t asks (generate config) to finish | 96.38s 2021-08-23 15:00:24.561580 | 525400e8-92c8-47b1-e162-000000004c60 | SUMMARY | overcloud-computehci-1 | Wait for container-puppet t asks (generate config) to finish | 93.41s 2021-08-23 15:00:24.561645 | 525400e8-92c8-47b1-e162-00000000434d | SUMMARY | overcloud-computehci-0 | Pre-fetch all the container s | 92.70s 2021-08-23 15:00:24.561712 | 525400e8-92c8-47b1-e162-0000000043ed | SUMMARY | overcloud-computehci-2 | Pre-fetch all the container s | 91.90s 2021-08-23 15:00:24.561782 | 525400e8-92c8-47b1-e162-000000004385 | SUMMARY | overcloud-computehci-1 | Pre-fetch all the container s | 91.88s 2021-08-23 15:00:24.561876 | 525400e8-92c8-47b1-e162-00000000491c | SUMMARY | overcloud-computehci-1 | Wait for puppet host config uration to finish | 90.37s 2021-08-23 15:00:24.561947 | 525400e8-92c8-47b1-e162-000000004951 | SUMMARY | overcloud-computehci-2 | Wait for puppet host config uration to finish | 90.37s 2021-08-23 15:00:24.562016 | 525400e8-92c8-47b1-e162-0000000048e6 | SUMMARY | overcloud-computehci-0 | Wait for puppet host config uration to finish | 90.35s 2021-08-23 15:00:24.562080 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2021-08-23 15:00:24.562196 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2021-08-23 15:00:24.562311 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~ 2021-08-23 15:00:24.562379 | The following node(s) had failures: undercloud 2021-08-23 15:00:24.562456 |
>> Host 10.0.2.40 not found in /home/stack/.ssh/known_hosts >> Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log.Overcloud Endpoint: http://10.0.2.40:5000 >> Overcloud Horizon Dashboard URL: http://10.0.2.40:80/dashboard >> Overcloud rc file: /home/stack/overcloudrc >> Overcloud Deployed with error >> Overcloud configuration failed. >> > > > Could someone help debug this, the ansible.log is huge, I can't see what's the origin of the problem, if someone can point me to the right direction it will aprecciated. > Thanks in advance. > > Regards. > > Le mer. 18 août 2021 à 18:02, Wesley Hayutin <whayutin@redhat.com> a écrit : >> >> >> >> On Wed, Aug 18, 2021 at 10:10 AM Dmitry Tantsur <dtantsur@redhat.com> wrote: >>> >>> Hi, >>> >>> On Wed, Aug 18, 2021 at 4:39 PM wodel youchi < wodel.youchi@gmail.com> wrote: >>>> >>>> Hi, >>>> I am trying to deploy openstack with tripleO using VMs and nested-KVM for the compute node. This is for test and learning purposes. >>>> >>>> I am using the Train version and following some tutorials. >>>> I prepared my different template files and started the deployment, but I got these errors : >>>> >>>> Failed to provision instance fc40457e-4b3c-4402-ae9d-c528f2c2ad30: Asynchronous exception: Node failed to deploy. Exception: Agent API for node 6d3724fc-6f13-4588-bbe5-56bc4f9a4f87 returned HTTP status code 404 with error: Not found: Extension with id iscsi not found. for node >>>> >>> >>> You somehow ended up using master (Xena release) deploy ramdisk with Train TripleO. You need to make sure to download Train images. I hope TripleO people can point you at the right place. >>> >>> Dmitry >> >> >> http://images.rdoproject.org/centos8/ >> http://images.rdoproject.org/centos8/train/rdo_trunk/current-tripleo/ >> >>> >>> >>>> >>>> and >>>> >>>> Got HTTP 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'CUSTOM_BAREMETAL' on resource provider '6d3724fc-6f13-4588-bbe5-56bc4f9a4f87'. The requested amount would exceed the capacity. ", >>>> >>>> Could you help understand what those errors mean? I couldn't find anything similar on the net. >>>> >>>> Thanks in advance. >>>> >>>> Regards. >>> >>> >>> >>> -- >>> Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, >>> Commercial register: Amtsgericht Muenchen, HRB 153243, >>> Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- Francesco Pantano GPG KEY: F41BD75C