[Triple0] [Wallaby] External Ceph Integration getting failed

newer
[xena][placement] Xena placement...

Lokendra Rathour

4 Aug 2022 4 Aug '22

4:37 a.m.

Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error: 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u *external-ceph.conf:* parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~ Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this : [root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log -- ~ Lokendra skype: lokendrarathour

Attachments:

attachment.html (text/html — 7.5 KB)

Show replies by date

Francesco Pantano

4 Aug 4 Aug

6:51 a.m.

...

From your previous email I suspect in your external-ceph.yaml you missed

Hi, ceph is supposed to be configured by this tripleo-ansible role [1], which is triggered by tht on external_deploy_steps [2]. In theory adding [3] should just work, assuming you customize the ceph cluster mon ip addresses, fsid and a few other related variables. the TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml (see [3]). Thanks, Francesco [1] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... [3] https://github.com/openstack/tripleo-heat-templates/blob/master/environments... On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour <lokendrarathour@gmail.com> wrote:

...

Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error:

2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u

*external-ceph.conf:*

parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~

Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this :

[root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log

-- ~ Lokendra skype: lokendrarathour

-- Francesco Pantano GPG KEY: F41BD75C

Lokendra Rathour

10 Aug 10 Aug

8:45 p.m.

Hi Thanks, for the inputs, we could see the miss, now we have added the required miss : "TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml" Now with this setting if we deploy the setup in wallaby, we are getting error as: PLAY [External deployment step 1] ********************************************** 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | TASK | External deployment step 1 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | OK | External deployment step 1 | undercloud -> localhost | result={ "changed": false, "msg": "Use --start-at-task 'External deployment step 1' to resume from this task" } [WARNING]: ('undercloud -> localhost', '525400d4-7124-4a42-664c-0000000000a8') missing from stats 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | INCLUDED | /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml | undercloud 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | TASK | Set some tripleo-ansible facts 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | OK | Set some tripleo-ansible facts | undercloud 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | 0.03s 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | TASK | Container image prepare 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | FATAL | Container image prepare | *undercloud | error={"changed": false, "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", "msg": "Error running container image prepare: None: Max retries exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | TIMING | tripleo_container_image_prepare : Container image prepare | undercloud | 0:06:13.385607 | 72.12s This gets failed at step 1, As this is wallaby and based on the document (Use an external Ceph cluster with the Overcloud — TripleO 3.0.0 documentation (openstack.org) <https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/ceph_external.html>) we should only pass this external-ceph.yaml for the external ceph intergration. But it is not happening. Few things to note: 1. Container Prepare: (undercloud) [stack@undercloud ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2022-06-28T18:56:38.642315 # # openstack tripleo container image prepare default --local-push-destination --output-env-file /home/stack/containers-prepare-parameter.yaml # parameter_defaults: ContainerImagePrepare: - push_destination: true set: name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version (undercloud) [stack@undercloud ~]$ 2. this is SSL based deployment. Any idea for the error, the issue is seen only once we have the external ceph integration enabled. Best Regards, Lokendra On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano <fpantano@redhat.com> wrote:

...

Hi, ceph is supposed to be configured by this tripleo-ansible role [1], which is triggered by tht on external_deploy_steps [2]. In theory adding [3] should just work, assuming you customize the ceph cluster mon ip addresses, fsid and a few other related variables. From your previous email I suspect in your external-ceph.yaml you missed the TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml (see [3]).

Thanks, Francesco

[1] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... [3] https://github.com/openstack/tripleo-heat-templates/blob/master/environments...

On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour <lokendrarathour@gmail.com> wrote:

...
Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error:

2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u

*external-ceph.conf:*

parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~

Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this :

[root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log

-- ~ Lokendra skype: lokendrarathour

-- Francesco Pantano GPG KEY: F41BD75C

-- ~ Lokendra skype: lokendrarathour

John Fulton

11 Aug 11 Aug

9:29 a.m.

The ceph container should no longer be needed for external ceph configuration (since the move from ceph-ansible to cephadm) but if removing the ceph env files makes the error go away, then try adding it back and then following these steps to prepare the ceph container on your undercloud before deploying. https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features... On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour <lokendrarathour@gmail.com> wrote:

...

Hi Thanks, for the inputs, we could see the miss, now we have added the required miss : "TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml"

Now with this setting if we deploy the setup in wallaby, we are getting error as:

PLAY [External deployment step 1] ********************************************** 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | TASK | External deployment step 1 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | OK | External deployment step 1 | undercloud -> localhost | result={ "changed": false, "msg": "Use --start-at-task 'External deployment step 1' to resume from this task" } [WARNING]: ('undercloud -> localhost', '525400d4-7124-4a42-664c-0000000000a8') missing from stats 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | INCLUDED | /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml | undercloud 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | TASK | Set some tripleo-ansible facts 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | OK | Set some tripleo-ansible facts | undercloud 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | 0.03s 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | TASK | Container image prepare 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | FATAL | Container image prepare | *undercloud | error={"changed": false, "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", "msg": "Error running container image prepare: None: Max retries exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | TIMING | tripleo_container_image_prepare : Container image prepare | undercloud | 0:06:13.385607 | 72.12s

This gets failed at step 1, As this is wallaby and based on the document (Use an external Ceph cluster with the Overcloud — TripleO 3.0.0 documentation (openstack.org) <https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/ceph_external.html>) we should only pass this external-ceph.yaml for the external ceph intergration. But it is not happening.

Few things to note: 1. Container Prepare:

(undercloud) [stack@undercloud ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2022-06-28T18:56:38.642315 # # openstack tripleo container image prepare default --local-push-destination --output-env-file /home/stack/containers-prepare-parameter.yaml #

parameter_defaults: ContainerImagePrepare: - push_destination: true set: name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version (undercloud) [stack@undercloud ~]$

2. this is SSL based deployment.

Any idea for the error, the issue is seen only once we have the external ceph integration enabled.

Best Regards, Lokendra

On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano <fpantano@redhat.com> wrote:

...
Hi, ceph is supposed to be configured by this tripleo-ansible role [1], which is triggered by tht on external_deploy_steps [2]. In theory adding [3] should just work, assuming you customize the ceph cluster mon ip addresses, fsid and a few other related variables. From your previous email I suspect in your external-ceph.yaml you missed the TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml (see [3]).

Thanks, Francesco

[1] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... [3] https://github.com/openstack/tripleo-heat-templates/blob/master/environments...

On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error:

2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u

*external-ceph.conf:*

parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~

Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this :

[root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log

-- ~ Lokendra skype: lokendrarathour

-- Francesco Pantano GPG KEY: F41BD75C

-- ~ Lokendra skype: lokendrarathour

Lokendra Rathour

19 Aug 19 Aug

12:44 a.m.

Hi Fulton, Thanks for the inputs and apologies for the delay in response. to my surprise passing the container prepare in standard worked for me, new container-prepare is: parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_alertmanager_image: alertmanager ceph_alertmanager_namespace: quay.ceph.io/prometheus ceph_alertmanager_tag: v0.16.2 ceph_grafana_image: grafana ceph_grafana_namespace: quay.ceph.io/app-sre ceph_grafana_tag: 6.7.4 ceph_image: daemon ceph_namespace: quay.io/ceph ceph_node_exporter_image: node-exporter ceph_node_exporter_namespace: quay.ceph.io/prometheus ceph_node_exporter_tag: v0.17.0 ceph_prometheus_image: prometheus ceph_prometheus_namespace: quay.ceph.io/prometheus ceph_prometheus_tag: v2.7.2 ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version But if we see or look at these containers I do not see any such containers available. we have tried looking at Undercloud and overcloud. Also, the deployment is done when we are passing this config. Thanks once again. Also, we need to understand some use cases of using the storage from this external ceph, which can work as the mount for the VM as direct or Shared storage. Any idea or available document which tells more about how to consume external Ceph in the existing triple Overcloud? Do share in case you know any, please. Thanks once again for the support, it was really helpful On Thu, Aug 11, 2022 at 9:59 PM John Fulton <johfulto@redhat.com> wrote:

...

The ceph container should no longer be needed for external ceph configuration (since the move from ceph-ansible to cephadm) but if removing the ceph env files makes the error go away, then try adding it back and then following these steps to prepare the ceph container on your undercloud before deploying.

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour <lokendrarathour@gmail.com> wrote:

...
Hi Thanks, for the inputs, we could see the miss, now we have added the required miss : "TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml"

Now with this setting if we deploy the setup in wallaby, we are getting error as:

PLAY [External deployment step 1] ********************************************** 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | TASK | External deployment step 1 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | OK | External deployment step 1 | undercloud -> localhost | result={ "changed": false, "msg": "Use --start-at-task 'External deployment step 1' to resume from this task" } [WARNING]: ('undercloud -> localhost', '525400d4-7124-4a42-664c-0000000000a8') missing from stats 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | INCLUDED | /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml | undercloud 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | TASK | Set some tripleo-ansible facts 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | OK | Set some tripleo-ansible facts | undercloud 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | 0.03s 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | TASK | Container image prepare 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | FATAL | Container image prepare | *undercloud | error={"changed": false, "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", "msg": "Error running container image prepare: None: Max retries exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | TIMING | tripleo_container_image_prepare : Container image prepare | undercloud | 0:06:13.385607 | 72.12s

This gets failed at step 1, As this is wallaby and based on the document (Use an external Ceph cluster with the Overcloud — TripleO 3.0.0 documentation (openstack.org) <https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/ceph_external.html>) we should only pass this external-ceph.yaml for the external ceph intergration. But it is not happening.

Few things to note: 1. Container Prepare:

(undercloud) [stack@undercloud ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2022-06-28T18:56:38.642315 # # openstack tripleo container image prepare default --local-push-destination --output-env-file /home/stack/containers-prepare-parameter.yaml #

parameter_defaults: ContainerImagePrepare: - push_destination: true set: name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version (undercloud) [stack@undercloud ~]$

2. this is SSL based deployment.

Any idea for the error, the issue is seen only once we have the external ceph integration enabled.

Best Regards, Lokendra

On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano <fpantano@redhat.com> wrote:

...
Hi, ceph is supposed to be configured by this tripleo-ansible role [1], which is triggered by tht on external_deploy_steps [2]. In theory adding [3] should just work, assuming you customize the ceph cluster mon ip addresses, fsid and a few other related variables. From your previous email I suspect in your external-ceph.yaml you missed the TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml (see [3]).

Thanks, Francesco

[1] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... [3] https://github.com/openstack/tripleo-heat-templates/blob/master/environments...

On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error:

2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u

*external-ceph.conf:*

parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~

Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this :

[root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log

-- ~ Lokendra skype: lokendrarathour

-- Francesco Pantano GPG KEY: F41BD75C

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

John Fulton

10:02 a.m.

On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour <lokendrarathour@gmail.com> wrote:

...

Hi Fulton, Thanks for the inputs and apologies for the delay in response. to my surprise passing the container prepare in standard worked for me, new container-prepare is:

parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_alertmanager_image: alertmanager ceph_alertmanager_namespace: quay.ceph.io/prometheus ceph_alertmanager_tag: v0.16.2 ceph_grafana_image: grafana ceph_grafana_namespace: quay.ceph.io/app-sre ceph_grafana_tag: 6.7.4 ceph_image: daemon ceph_namespace: quay.io/ceph ceph_node_exporter_image: node-exporter ceph_node_exporter_namespace: quay.ceph.io/prometheus ceph_node_exporter_tag: v0.17.0 ceph_prometheus_image: prometheus ceph_prometheus_namespace: quay.ceph.io/prometheus ceph_prometheus_tag: v2.7.2 ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version

But if we see or look at these containers I do not see any such containers available. we have tried looking at Undercloud and overcloud.

The undercloud can download continers from the sources above and then act as a container registry. It's described here: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployme...

...

Also, the deployment is done when we are passing this config. Thanks once again.

Also, we need to understand some use cases of using the storage from this external ceph, which can work as the mount for the VM as direct or Shared storage. Any idea or available document which tells more about how to consume external Ceph in the existing triple Overcloud?

Ceph can provide OpenStack Block, Object and File storage and TripleO supports a variety of integration options for them. TripleO can deploy Ceph as part of the OpenStack overcloud: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features... TripleO can also deploy an OpenStack overcloud which uses an existing external ceph cluster: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features... At the end of both of these documents you can expect Glance, Nova, and Cinder to use Ceph block storage (RBD). You can also have OpenStack use Ceph object storage (RGW). When RGW is used, a command like "openstack container create foo" will create an object storage container (not to be confused with podman/docker) on CephRGW as if your overcloud were running OpenStack Swift. If you have TripleO deploy Ceph as part of the OpenStack overcloud, RGW will be deployed and configured for OpenStack object storage by default (in Wallaby+). The OpenStack Manila service can use CephFS as one of its backends. TripleO can deploy that too as described here: https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features... John

...

Do share in case you know any, please.

Thanks once again for the support, it was really helpful

On Thu, Aug 11, 2022 at 9:59 PM John Fulton <johfulto@redhat.com> wrote:

...
The ceph container should no longer be needed for external ceph configuration (since the move from ceph-ansible to cephadm) but if removing the ceph env files makes the error go away, then try adding it back and then following these steps to prepare the ceph container on your undercloud before deploying.

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Thanks, for the inputs, we could see the miss, now we have added the required miss : "TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml"

Now with this setting if we deploy the setup in wallaby, we are getting error as:

PLAY [External deployment step 1] ********************************************** 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | TASK | External deployment step 1 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | OK | External deployment step 1 | undercloud -> localhost | result={ "changed": false, "msg": "Use --start-at-task 'External deployment step 1' to resume from this task" } [WARNING]: ('undercloud -> localhost', '525400d4-7124-4a42-664c-0000000000a8') missing from stats 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | INCLUDED | /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml | undercloud 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | TASK | Set some tripleo-ansible facts 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | OK | Set some tripleo-ansible facts | undercloud 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | 0.03s 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | TASK | Container image prepare 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | FATAL | Container image prepare | *undercloud | error={"changed": false, "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", "msg": "Error running container image prepare: None: Max retries exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | TIMING | tripleo_container_image_prepare : Container image prepare | undercloud | 0:06:13.385607 | 72.12s

This gets failed at step 1, As this is wallaby and based on the document (Use an external Ceph cluster with the Overcloud — TripleO 3.0.0 documentation (openstack.org) <https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/ceph_external.html>) we should only pass this external-ceph.yaml for the external ceph intergration. But it is not happening.

Few things to note: 1. Container Prepare:

(undercloud) [stack@undercloud ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2022-06-28T18:56:38.642315 # # openstack tripleo container image prepare default --local-push-destination --output-env-file /home/stack/containers-prepare-parameter.yaml #

parameter_defaults: ContainerImagePrepare: - push_destination: true set: name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version (undercloud) [stack@undercloud ~]$

2. this is SSL based deployment.

Any idea for the error, the issue is seen only once we have the external ceph integration enabled.

Best Regards, Lokendra

On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano <fpantano@redhat.com> wrote:

...
Hi, ceph is supposed to be configured by this tripleo-ansible role [1], which is triggered by tht on external_deploy_steps [2]. In theory adding [3] should just work, assuming you customize the ceph cluster mon ip addresses, fsid and a few other related variables. From your previous email I suspect in your external-ceph.yaml you missed the TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml (see [3]).

Thanks, Francesco

[1] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... [3] https://github.com/openstack/tripleo-heat-templates/blob/master/environments...

On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error:

2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u

*external-ceph.conf:*

parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~

Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this :

[root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log

-- ~ Lokendra skype: lokendrarathour

-- Francesco Pantano GPG KEY: F41BD75C

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

Lokendra Rathour

25 Aug 25 Aug

6:04 a.m.

Hi John, Thanks for the inputs. Now I see something strange. Deployment with external ceph is unstable, it got deployed once and we saw an error of VM not getting created because of some reasons, we were debugging and found that we found some NTP related observation, which we fixed and tried redeploying. Not it again got failed at step 4: 2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a | TASK | Create identity internal endpoint 2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a | FATAL | Create identity internal endpoint | undercloud | error={"changed": false, "extra_data": {"data": null, "details": "The request you have made requires authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The request you have made requires authentication.\",\"title\":\"Unauthorized\"} }\n"}, "msg": "Failed to list services: Client Error for url: https://overcloud-public.mydomain.com:13000/v3/services <https://overcloud-public.myhsc.com:13000/v3/services>, The request you have made requires authentication."} To revalidate the case, I tried a fresh setup and saw that deployment again failed at step 4. and when we remove external ceph from the deployment command, we see that deployment is happening 100%. I see authorization errors, which I used to get earlier as well, but because of DNS we were able to resolve this. what could be the reason for this when we are using External ceph ? any inputs would be helpful deploy command: stack@undercloud ~]$ cat deploy_step2.sh openstack overcloud deploy --templates \ -r /home/stack/templates/roles_data.yaml \ -n /home/stack/templates/custom_network_data.yaml \ -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ -e /home/stack/templates/networks-deployed-environment.yaml \ -e /home/stack/templates/vip-deployed-environment.yaml \ -e /home/stack/templates/environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml \ -e /home/stack/templates/ironic-config.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ -e /home/stack/templates/enable-tls.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \ -e /home/stack/templates/cloudname.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml \ -e /home/stack/templates/my-additional-ceph-settings.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ -e /home/stack/containers-prepare-parameter.yaml [stack@undercloud ~]$ cat /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml resource_registry: OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml parameter_defaults: # NOTE: These example parameters are required when using CephExternal CephClusterFSID: 'ca3080e3-aa3a-4d1a-b1fd-483459a9ea4c' CephClientKey: 'AQB2hMZi2u13NxAAVjmKopw+kNm6OnZOG7NktQ==' CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' # the following parameters enable Ceph backends for Cinder, Glance, Gnocchi and Nova NovaEnableRbdBackend: true CinderEnableRbdBackend: true CinderBackupBackend: ceph GlanceBackend: rbd # Uncomment below if enabling legacy telemetry # GnocchiBackend: rbd # If the Ceph pools which host VMs, Volumes and Images do not match these # names OR the client keyring to use is not named 'openstack', edit the # following as needed. NovaRbdPoolName: vms CinderRbdPoolName: volumes CinderBackupRbdPoolName: backups GlanceRbdPoolName: images # Uncomment below if enabling legacy telemetry # GnocchiRbdPoolName: metrics CephClientUserName: openstack # finally we disable the Cinder LVM backend CinderEnableIscsiBackend: false On Fri, Aug 19, 2022 at 10:32 PM John Fulton <johfulto@redhat.com> wrote:

...

On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Fulton, Thanks for the inputs and apologies for the delay in response. to my surprise passing the container prepare in standard worked for me, new container-prepare is:

parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_alertmanager_image: alertmanager ceph_alertmanager_namespace: quay.ceph.io/prometheus ceph_alertmanager_tag: v0.16.2 ceph_grafana_image: grafana ceph_grafana_namespace: quay.ceph.io/app-sre ceph_grafana_tag: 6.7.4 ceph_image: daemon ceph_namespace: quay.io/ceph ceph_node_exporter_image: node-exporter ceph_node_exporter_namespace: quay.ceph.io/prometheus ceph_node_exporter_tag: v0.17.0 ceph_prometheus_image: prometheus ceph_prometheus_namespace: quay.ceph.io/prometheus ceph_prometheus_tag: v2.7.2 ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version

But if we see or look at these containers I do not see any such containers available. we have tried looking at Undercloud and overcloud.

The undercloud can download continers from the sources above and then act as a container registry. It's described here:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployme...

...
Also, the deployment is done when we are passing this config. Thanks once again.

Also, we need to understand some use cases of using the storage from this external ceph, which can work as the mount for the VM as direct or Shared storage. Any idea or available document which tells more about how to consume external Ceph in the existing triple Overcloud?

Ceph can provide OpenStack Block, Object and File storage and TripleO supports a variety of integration options for them.

TripleO can deploy Ceph as part of the OpenStack overcloud:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

TripleO can also deploy an OpenStack overcloud which uses an existing external ceph cluster:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

At the end of both of these documents you can expect Glance, Nova, and Cinder to use Ceph block storage (RBD).

You can also have OpenStack use Ceph object storage (RGW). When RGW is used, a command like "openstack container create foo" will create an object storage container (not to be confused with podman/docker) on CephRGW as if your overcloud were running OpenStack Swift. If you have TripleO deploy Ceph as part of the OpenStack overcloud, RGW will be deployed and configured for OpenStack object storage by default (in Wallaby+).

The OpenStack Manila service can use CephFS as one of its backends. TripleO can deploy that too as described here:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

John

...
Do share in case you know any, please.

Thanks once again for the support, it was really helpful

On Thu, Aug 11, 2022 at 9:59 PM John Fulton <johfulto@redhat.com> wrote:

...
The ceph container should no longer be needed for external ceph configuration (since the move from ceph-ansible to cephadm) but if removing the ceph env files makes the error go away, then try adding it back and then following these steps to prepare the ceph container on your undercloud before deploying.

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Thanks, for the inputs, we could see the miss, now we have added the required miss : "TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml"

Now with this setting if we deploy the setup in wallaby, we are getting error as:

PLAY [External deployment step 1] ********************************************** 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | TASK | External deployment step 1 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | OK | External deployment step 1 | undercloud -> localhost | result={ "changed": false, "msg": "Use --start-at-task 'External deployment step 1' to resume from this task" } [WARNING]: ('undercloud -> localhost', '525400d4-7124-4a42-664c-0000000000a8') missing from stats 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | INCLUDED | /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml | undercloud 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | TASK | Set some tripleo-ansible facts 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | OK | Set some tripleo-ansible facts | undercloud 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | 0.03s 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | TASK | Container image prepare 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | FATAL | Container image prepare | *undercloud | error={"changed": false, "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", "msg": "Error running container image prepare: None: Max retries exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | TIMING | tripleo_container_image_prepare : Container image prepare | undercloud | 0:06:13.385607 | 72.12s

This gets failed at step 1, As this is wallaby and based on the document (Use an external Ceph cluster with the Overcloud — TripleO 3.0.0 documentation (openstack.org) <https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/ceph_external.html>) we should only pass this external-ceph.yaml for the external ceph intergration. But it is not happening.

Few things to note: 1. Container Prepare:

(undercloud) [stack@undercloud ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2022-06-28T18:56:38.642315 # # openstack tripleo container image prepare default --local-push-destination --output-env-file /home/stack/containers-prepare-parameter.yaml #

parameter_defaults: ContainerImagePrepare: - push_destination: true set: name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version (undercloud) [stack@undercloud ~]$

2. this is SSL based deployment.

Any idea for the error, the issue is seen only once we have the external ceph integration enabled.

Best Regards, Lokendra

On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano <fpantano@redhat.com> wrote:

...
Hi, ceph is supposed to be configured by this tripleo-ansible role [1], which is triggered by tht on external_deploy_steps [2]. In theory adding [3] should just work, assuming you customize the ceph cluster mon ip addresses, fsid and a few other related variables. From your previous email I suspect in your external-ceph.yaml you missed the TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml (see [3]).

Thanks, Francesco

[1] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... [3] https://github.com/openstack/tripleo-heat-templates/blob/master/environments...

On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error:

2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u

*external-ceph.conf:*

parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~

Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this :

[root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log

-- ~ Lokendra skype: lokendrarathour

-- Francesco Pantano GPG KEY: F41BD75C

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

John Fulton

6:32 a.m.

On Thu, Aug 25, 2022 at 9:04 AM Lokendra Rathour <lokendrarathour@gmail.com> wrote:

...

Hi John, Thanks for the inputs. Now I see something strange. Deployment with external ceph is unstable,

I assume you're using Wallaby. There's a downstream job testing external ceph daily. The external ceph feature of TripleO in Wallaby is stable. I think you have something else going on that conflates with your use of external ceph.

...

it got deployed once and we saw an error of VM not getting created because of some reasons, we were debugging and found that we found some NTP related observation, which we fixed and tried redeploying. Not it again got failed at step 4:

External Ceph is already configured before step 4. You can inspect your system after this failure to see that this role: https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... has done its job of distributing cephx keys and a ceph.conf file into this path: https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/depl... That should be all that doing a "-e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml" results in. Maybe there's something else in my-additional-ceph-settings.yaml that shouldn't be there that's causing your overcloud to try to create an endpoint? I think that unlikely but I'm trying to come up with an explanation for the correlation you're reporting. 2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a | TASK

...

| Create identity internal endpoint 2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a | FATAL | Create identity internal endpoint | undercloud | error={"changed": false, "extra_data": {"data": null, "details": "The request you have made requires authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The request you have made requires authentication.\",\"title\":\"Unauthorized\"} }\n"}, "msg": "Failed to list services: Client Error for url: https://overcloud-public.mydomain.com:13000/v3/services <https://overcloud-public.myhsc.com:13000/v3/services>, The request you have made requires authentication."}

The above is happening from this role: https://github.com/openstack/tripleo-ansible/blob/e9cc12d4ce0b1c9e96b58f6102... To revalidate the case, I tried a fresh setup and saw that deployment again

...

failed at step 4. and when we remove external ceph from the deployment command, we see that deployment is happening 100%. I see authorization errors, which I used to get earlier as well, but because of DNS we were able to resolve this. what could be the reason for this when we are using External ceph ?

I really don't think this is related to external ceph configuration. Correlation does not always mean causality. any inputs would be helpful

...

deploy command:

stack@undercloud ~]$ cat deploy_step2.sh openstack overcloud deploy --templates \ -r /home/stack/templates/roles_data.yaml \ -n /home/stack/templates/custom_network_data.yaml \ -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ -e /home/stack/templates/networks-deployed-environment.yaml \ -e /home/stack/templates/vip-deployed-environment.yaml \ -e /home/stack/templates/environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml \ -e /home/stack/templates/ironic-config.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ -e /home/stack/templates/enable-tls.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \ -e /home/stack/templates/cloudname.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml \ -e /home/stack/templates/my-additional-ceph-settings.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ -e /home/stack/containers-prepare-parameter.yaml

Also, please re-arrange the order of your templates. Any path with /usr/share should be first. Then any path in /home should then follow. openstack overcloud deploy --templates \ -r /home/stack/templates/roles_data.yaml \ -n /home/stack/templates/custom_network_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml \ <other /usr/share paths> -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ <other templates in /home/stack> You override the values in the templates tripleo ships (in /usr/share) with your own values for your env (in /home/stack). If you include /usr/share after, then you could override your own custom values. Just a best practice to rule out other issues. At this point I think you should find out what command is being executed when this task runs: https://github.com/openstack/tripleo-ansible/blob/e9cc12d4ce0b1c9e96b58f6102... Find out the values. Then run that command manually on the CLI of the system where it is failing. At that point you'll have decoupled what the deployment tool is doing vs the failing command on your system and share that on the list.

...

[stack@undercloud ~]$ cat /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml resource_registry: OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml

parameter_defaults: # NOTE: These example parameters are required when using CephExternal CephClusterFSID: 'ca3080e3-aa3a-4d1a-b1fd-483459a9ea4c' CephClientKey: 'AQB2hMZi2u13NxAAVjmKopw+kNm6OnZOG7NktQ==' CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13'

# the following parameters enable Ceph backends for Cinder, Glance, Gnocchi and Nova NovaEnableRbdBackend: true CinderEnableRbdBackend: true CinderBackupBackend: ceph GlanceBackend: rbd # Uncomment below if enabling legacy telemetry # GnocchiBackend: rbd # If the Ceph pools which host VMs, Volumes and Images do not match these # names OR the client keyring to use is not named 'openstack', edit the # following as needed. NovaRbdPoolName: vms CinderRbdPoolName: volumes CinderBackupRbdPoolName: backups GlanceRbdPoolName: images # Uncomment below if enabling legacy telemetry # GnocchiRbdPoolName: metrics CephClientUserName: openstack

# finally we disable the Cinder LVM backend CinderEnableIscsiBackend: false

I recommend instead that you not modify /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml and instead include it and then after it that you override your own values. John

...

On Fri, Aug 19, 2022 at 10:32 PM John Fulton <johfulto@redhat.com> wrote:

...
On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Fulton, Thanks for the inputs and apologies for the delay in response. to my surprise passing the container prepare in standard worked for me, new container-prepare is:

parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_alertmanager_image: alertmanager ceph_alertmanager_namespace: quay.ceph.io/prometheus ceph_alertmanager_tag: v0.16.2 ceph_grafana_image: grafana ceph_grafana_namespace: quay.ceph.io/app-sre ceph_grafana_tag: 6.7.4 ceph_image: daemon ceph_namespace: quay.io/ceph ceph_node_exporter_image: node-exporter ceph_node_exporter_namespace: quay.ceph.io/prometheus ceph_node_exporter_tag: v0.17.0 ceph_prometheus_image: prometheus ceph_prometheus_namespace: quay.ceph.io/prometheus ceph_prometheus_tag: v2.7.2 ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version

But if we see or look at these containers I do not see any such containers available. we have tried looking at Undercloud and overcloud.

The undercloud can download continers from the sources above and then act as a container registry. It's described here:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployme...

...
Also, the deployment is done when we are passing this config. Thanks once again.

Also, we need to understand some use cases of using the storage from this external ceph, which can work as the mount for the VM as direct or Shared storage. Any idea or available document which tells more about how to consume external Ceph in the existing triple Overcloud?

Ceph can provide OpenStack Block, Object and File storage and TripleO supports a variety of integration options for them.

TripleO can deploy Ceph as part of the OpenStack overcloud:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

TripleO can also deploy an OpenStack overcloud which uses an existing external ceph cluster:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

At the end of both of these documents you can expect Glance, Nova, and Cinder to use Ceph block storage (RBD).

You can also have OpenStack use Ceph object storage (RGW). When RGW is used, a command like "openstack container create foo" will create an object storage container (not to be confused with podman/docker) on CephRGW as if your overcloud were running OpenStack Swift. If you have TripleO deploy Ceph as part of the OpenStack overcloud, RGW will be deployed and configured for OpenStack object storage by default (in Wallaby+).

The OpenStack Manila service can use CephFS as one of its backends. TripleO can deploy that too as described here:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

John

...
Do share in case you know any, please.

Thanks once again for the support, it was really helpful

On Thu, Aug 11, 2022 at 9:59 PM John Fulton <johfulto@redhat.com> wrote:

...
The ceph container should no longer be needed for external ceph configuration (since the move from ceph-ansible to cephadm) but if removing the ceph env files makes the error go away, then try adding it back and then following these steps to prepare the ceph container on your undercloud before deploying.

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Thanks, for the inputs, we could see the miss, now we have added the required miss : "TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml"

Now with this setting if we deploy the setup in wallaby, we are getting error as:

PLAY [External deployment step 1] ********************************************** 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | TASK | External deployment step 1 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | OK | External deployment step 1 | undercloud -> localhost | result={ "changed": false, "msg": "Use --start-at-task 'External deployment step 1' to resume from this task" } [WARNING]: ('undercloud -> localhost', '525400d4-7124-4a42-664c-0000000000a8') missing from stats 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | INCLUDED | /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml | undercloud 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | TASK | Set some tripleo-ansible facts 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | OK | Set some tripleo-ansible facts | undercloud 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | 0.03s 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | TASK | Container image prepare 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | FATAL | Container image prepare | *undercloud | error={"changed": false, "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", "msg": "Error running container image prepare: None: Max retries exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | TIMING | tripleo_container_image_prepare : Container image prepare | undercloud | 0:06:13.385607 | 72.12s

This gets failed at step 1, As this is wallaby and based on the document (Use an external Ceph cluster with the Overcloud — TripleO 3.0.0 documentation (openstack.org) <https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/ceph_external.html>) we should only pass this external-ceph.yaml for the external ceph intergration. But it is not happening.

Few things to note: 1. Container Prepare:

(undercloud) [stack@undercloud ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2022-06-28T18:56:38.642315 # # openstack tripleo container image prepare default --local-push-destination --output-env-file /home/stack/containers-prepare-parameter.yaml #

parameter_defaults: ContainerImagePrepare: - push_destination: true set: name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version (undercloud) [stack@undercloud ~]$

2. this is SSL based deployment.

Any idea for the error, the issue is seen only once we have the external ceph integration enabled.

Best Regards, Lokendra

On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano <fpantano@redhat.com> wrote:

...
Hi, ceph is supposed to be configured by this tripleo-ansible role [1], which is triggered by tht on external_deploy_steps [2]. In theory adding [3] should just work, assuming you customize the ceph cluster mon ip addresses, fsid and a few other related variables. From your previous email I suspect in your external-ceph.yaml you missed the TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml (see [3]).

Thanks, Francesco

[1] https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... [2] https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... [3] https://github.com/openstack/tripleo-heat-templates/blob/master/environments...

On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

> Hi Team, > I was trying to integrate External Ceph with Triple0 Wallaby, and at > the end of deployment in step4 getting the below error: > > 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | > Create containers from > /var/lib/tripleo-config/container-startup-config/step_4 > 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | > /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | > overcloud-controller-2 > 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | > Create containers managed by Podman for > /var/lib/tripleo-config/container-startup-config/step_4 > 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:24.530812 | | WARNING | > ERROR: Can't run container nova_libvirt_init_secret > stderr: > 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 > 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | > Create containers managed by Podman for > /var/lib/tripleo-config/container-startup-config/step_4 | > overcloud-novacompute-0 | error={"changed": false, "msg": "Failed > containers: nova_libvirt_init_secret"} > 2022-08-03 18:37:44,282 p=507732 u > > > *external-ceph.conf:* > > parameter_defaults: > # Enable use of RBD backend in nova-compute > NovaEnableRbdBackend: True > # Enable use of RBD backend in cinder-volume > CinderEnableRbdBackend: True > # Backend to use for cinder-backup > CinderBackupBackend: ceph > # Backend to use for glance > GlanceBackend: rbd > # Name of the Ceph pool hosting Nova ephemeral images > NovaRbdPoolName: vms > # Name of the Ceph pool hosting Cinder volumes > CinderRbdPoolName: volumes > # Name of the Ceph pool hosting Cinder backups > CinderBackupRbdPoolName: backups > # Name of the Ceph pool hosting Glance images > GlanceRbdPoolName: images > # Name of the user to authenticate with the external Ceph cluster > CephClientUserName: admin > # The cluster FSID > CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' > # The CephX user auth key > CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' > # The list of Ceph monitors > CephExternalMonHost: > 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' > ~ > > > Have tried checking and validating the ceph client details and they > seem to be correct, further digging the container log I could see something > like this : > > [root@overcloud-novacompute-0 containers]# tail -f > nova_libvirt_init_secret.log > tail: cannot open 'nova_libvirt_init_secret.log' for reading: No > such file or directory > tail: no files remaining > [root@overcloud-novacompute-0 containers]# tail -f > stdouts/nova_libvirt_init_secret.log > 2022-08-04T11:48:47.689898197+05:30 stdout F > ------------------------------------------------ > 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh > secrets for: ceph:admin > 2022-08-04T11:48:47.690590594+05:30 stdout F Error: > /etc/ceph/ceph.conf was not found > 2022-08-04T11:48:47.690625088+05:30 stdout F Path to > nova_libvirt_init_secret was ceph:admin > 2022-08-04T16:20:29.643785538+05:30 stdout F > ------------------------------------------------ > 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh > secrets for: ceph:admin > 2022-08-04T16:20:29.644785532+05:30 stdout F Error: > /etc/ceph/ceph.conf was not found > 2022-08-04T16:20:29.644785532+05:30 stdout F Path to > nova_libvirt_init_secret was ceph:admin > ^C > [root@overcloud-novacompute-0 containers]# tail -f > stdouts/nova_compute_init_log.log > > -- > ~ Lokendra > skype: lokendrarathour > > >

-- Francesco Pantano GPG KEY: F41BD75C

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

Lokendra Rathour

7:49 a.m.

Hi John, thanks for the quick response. "/home/stack/templates/my-additional-ceph-settings.yaml " this file is adding backward compatible state for my External Ceph (Octopus) [stack@undercloud ~]$ cat templates/my-additional-ceph-settings.yaml parameter_defaults: ExtraConfig: ceph::profile::params::rbd_default_features: '1' [stack@undercloud ~]$ I also agree that Ceph has nothing to do with this error, but somehow we use to get this error earlier when we were using SSL + DNS I tried rerunning the command that it should run to create the required endpoints and it is running. Then I reexecuted the steps in Debug mode: pending results.... The full traceback is: File "/tmp/ansible_openstack.cloud.endpoint_payload_qhlqb_qw/ansible_openstack.cloud.endpoint_payload.zip/ansible_collections/open stack/cloud/plugins/module_utils/openstack.py", line 407, in __call__ results = self.run() File "/tmp/ansible_openstack.cloud.endpoint_payload_qhlqb_qw/ansible_openstack.cloud.endpoint_payload.zip/ansible_collections/open stack/cloud/plugins/modules/endpoint.py", line 150, in run File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 537, in get_service return _utils._get_entity(self, 'service', name_or_id, filters) File "/usr/lib/python3.6/site-packages/openstack/cloud/_utils.py", line 197, in _get_entity entities = search(name_or_id, filters, **kwargs) File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 517, in search_services services = self.list_services() File "/usr/lib/python3.6/site-packages/openstack/cloud/_identity.py", line 501, in list_services error_message="Failed to list services") File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 395, in get return self.request(url, 'GET', **kwargs) File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 668, in request return _json_response(response, error_message=error_message) File "/usr/lib/python3.6/site-packages/openstack/proxy.py", line 646, in _json_response exceptions.raise_from_response(response, error_message=error_message) File "/usr/lib/python3.6/site-packages/openstack/exceptions.py", line 238, in raise_from_response http_status=http_status, request_id=request_id 2022-08-25 19:38:03.201899 | 5254004d-021e-7578-cd65-000000007ad6 | FATAL | Create identity internal endpoint | undercloud | er ror={ "changed": false, "extra_data": { "data": null, "details": "The request you have made requires authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The request you have made requires authentication.\",\"title\":\"Unautho rized\"}}\n" }, "invocation": { "module_args": { "api_timeout": null, "auth": null, "auth_type": null, "availability_zone": null, "ca_cert": null, "client_cert": null, "client_key": null, "enabled": true, "endpoint_interface": "internal", "interface": "public", "region": "regionOne", "region_name": null, "service": "keystone", "state": "present", "timeout": 180, "url": "http://[fd00:fd00:fd00:2000::368]:5000", "validate_certs": null, "wait": true } }, "msg": "Failed to list services: Client Error for url: https://overcloud-public.myhsc.com:13000/v3/services, The request you hav e made requires authentication." } Checking further in the endpoint list I see: DeprecationWarning +----------------------------------+-----------+--------------+--------------+---------+-----------+------------------------------------------+ | ID | Region | Service Name | Service Type | Enabled | Interface | URL | +----------------------------------+-----------+--------------+--------------+---------+-----------+------------------------------------------+ | 11c9e71cf2e3482c9af47afcdab54472 | regionOne | keystone | identity | True | internal | http://[fd00:fd00:fd00:2000::368]:5000 | | 34fdd910a4e641e8897a7360b504bdba | regionOne | keystone | identity | True | public | https://overcloud-public.myhsc.com:13000 | | 770eeebb8e544a93a0215158c6c9b811 | regionOne | keystone | identity | True | admin | http://30.30.30.142:35357 | +----------------------------------+-----------+----------- As you can see this is the internal point it creates and then also it stated the error for the internal endpoints reported above. trying to debug more around it, do let me know please in case something specific you see here. thanks once again. Lokendra On Thu, Aug 25, 2022 at 7:02 PM John Fulton <johfulto@redhat.com> wrote:

...

On Thu, Aug 25, 2022 at 9:04 AM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi John, Thanks for the inputs. Now I see something strange. Deployment with external ceph is unstable,

I assume you're using Wallaby.

There's a downstream job testing external ceph daily. The external ceph feature of TripleO in Wallaby is stable. I think you have something else going on that conflates with your use of external ceph.

...
it got deployed once and we saw an error of VM not getting created because of some reasons, we were debugging and found that we found some NTP related observation, which we fixed and tried redeploying. Not it again got failed at step 4:

External Ceph is already configured before step 4. You can inspect your system after this failure to see that this role:

https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol...

has done its job of distributing cephx keys and a ceph.conf file into this path:

https://github.com/openstack/tripleo-heat-templates/blob/stable/wallaby/depl...

That should be all that doing a "-e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml" results in. Maybe there's something else in my-additional-ceph-settings.yaml that shouldn't be there that's causing your overcloud to try to create an endpoint? I think that unlikely but I'm trying to come up with an explanation for the correlation you're reporting.

2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a | TASK

...
| Create identity internal endpoint 2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a | FATAL | Create identity internal endpoint | undercloud | error={"changed": false, "extra_data": {"data": null, "details": "The request you have made requires authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The request you have made requires authentication.\",\"title\":\"Unauthorized\"} }\n"}, "msg": "Failed to list services: Client Error for url: https://overcloud-public.mydomain.com:13000/v3/services <https://overcloud-public.myhsc.com:13000/v3/services>, The request you have made requires authentication."}

The above is happening from this role:

https://github.com/openstack/tripleo-ansible/blob/e9cc12d4ce0b1c9e96b58f6102...

To revalidate the case, I tried a fresh setup and saw that deployment

...
again failed at step 4. and when we remove external ceph from the deployment command, we see that deployment is happening 100%. I see authorization errors, which I used to get earlier as well, but because of DNS we were able to resolve this. what could be the reason for this when we are using External ceph ?

I really don't think this is related to external ceph configuration. Correlation does not always mean causality.

any inputs would be helpful

...
deploy command:

stack@undercloud ~]$ cat deploy_step2.sh openstack overcloud deploy --templates \ -r /home/stack/templates/roles_data.yaml \ -n /home/stack/templates/custom_network_data.yaml \ -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ -e /home/stack/templates/networks-deployed-environment.yaml \ -e /home/stack/templates/vip-deployed-environment.yaml \ -e /home/stack/templates/environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml \ -e /home/stack/templates/ironic-config.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \ -e /home/stack/templates/enable-tls.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml \ -e /home/stack/templates/cloudname.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml \ -e /home/stack/templates/my-additional-ceph-settings.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \ -e /home/stack/containers-prepare-parameter.yaml

Also, please re-arrange the order of your templates.

Any path with /usr/share should be first. Then any path in /home should then follow.

openstack overcloud deploy --templates \ -r /home/stack/templates/roles_data.yaml \ -n /home/stack/templates/custom_network_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml \ <other /usr/share paths> -e /home/stack/templates/overcloud-baremetal-deployed.yaml \ <other templates in /home/stack>

You override the values in the templates tripleo ships (in /usr/share) with your own values for your env (in /home/stack).

If you include /usr/share after, then you could override your own custom values. Just a best practice to rule out other issues.

At this point I think you should find out what command is being executed when this task runs:

https://github.com/openstack/tripleo-ansible/blob/e9cc12d4ce0b1c9e96b58f6102...

Find out the values. Then run that command manually on the CLI of the system where it is failing. At that point you'll have decoupled what the deployment tool is doing vs the failing command on your system and share that on the list.

...
[stack@undercloud ~]$ cat /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml resource_registry: OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml

parameter_defaults: # NOTE: These example parameters are required when using CephExternal CephClusterFSID: 'ca3080e3-aa3a-4d1a-b1fd-483459a9ea4c' CephClientKey: 'AQB2hMZi2u13NxAAVjmKopw+kNm6OnZOG7NktQ==' CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13'

# the following parameters enable Ceph backends for Cinder, Glance, Gnocchi and Nova NovaEnableRbdBackend: true CinderEnableRbdBackend: true CinderBackupBackend: ceph GlanceBackend: rbd # Uncomment below if enabling legacy telemetry # GnocchiBackend: rbd # If the Ceph pools which host VMs, Volumes and Images do not match these # names OR the client keyring to use is not named 'openstack', edit the # following as needed. NovaRbdPoolName: vms CinderRbdPoolName: volumes CinderBackupRbdPoolName: backups GlanceRbdPoolName: images # Uncomment below if enabling legacy telemetry # GnocchiRbdPoolName: metrics CephClientUserName: openstack

# finally we disable the Cinder LVM backend CinderEnableIscsiBackend: false

I recommend instead that you not modify /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml and instead include it and then after it that you override your own values.

John

...
On Fri, Aug 19, 2022 at 10:32 PM John Fulton <johfulto@redhat.com> wrote:

...
On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Fulton, Thanks for the inputs and apologies for the delay in response. to my surprise passing the container prepare in standard worked for me, new container-prepare is:

parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_alertmanager_image: alertmanager ceph_alertmanager_namespace: quay.ceph.io/prometheus ceph_alertmanager_tag: v0.16.2 ceph_grafana_image: grafana ceph_grafana_namespace: quay.ceph.io/app-sre ceph_grafana_tag: 6.7.4 ceph_image: daemon ceph_namespace: quay.io/ceph ceph_node_exporter_image: node-exporter ceph_node_exporter_namespace: quay.ceph.io/prometheus ceph_node_exporter_tag: v0.17.0 ceph_prometheus_image: prometheus ceph_prometheus_namespace: quay.ceph.io/prometheus ceph_prometheus_tag: v2.7.2 ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8 name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version

But if we see or look at these containers I do not see any such containers available. we have tried looking at Undercloud and overcloud.

The undercloud can download continers from the sources above and then act as a container registry. It's described here:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployme...

...
Also, the deployment is done when we are passing this config. Thanks once again.

Also, we need to understand some use cases of using the storage from this external ceph, which can work as the mount for the VM as direct or Shared storage. Any idea or available document which tells more about how to consume external Ceph in the existing triple Overcloud?

Ceph can provide OpenStack Block, Object and File storage and TripleO supports a variety of integration options for them.

TripleO can deploy Ceph as part of the OpenStack overcloud:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

TripleO can also deploy an OpenStack overcloud which uses an existing external ceph cluster:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

At the end of both of these documents you can expect Glance, Nova, and Cinder to use Ceph block storage (RBD).

You can also have OpenStack use Ceph object storage (RGW). When RGW is used, a command like "openstack container create foo" will create an object storage container (not to be confused with podman/docker) on CephRGW as if your overcloud were running OpenStack Swift. If you have TripleO deploy Ceph as part of the OpenStack overcloud, RGW will be deployed and configured for OpenStack object storage by default (in Wallaby+).

The OpenStack Manila service can use CephFS as one of its backends. TripleO can deploy that too as described here:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

John

...
Do share in case you know any, please.

Thanks once again for the support, it was really helpful

On Thu, Aug 11, 2022 at 9:59 PM John Fulton <johfulto@redhat.com> wrote:

...
The ceph container should no longer be needed for external ceph configuration (since the move from ceph-ansible to cephadm) but if removing the ceph env files makes the error go away, then try adding it back and then following these steps to prepare the ceph container on your undercloud before deploying.

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour < lokendrarathour@gmail.com> wrote:

...
Hi Thanks, for the inputs, we could see the miss, now we have added the required miss : "TripleO resource OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml"

Now with this setting if we deploy the setup in wallaby, we are getting error as:

PLAY [External deployment step 1] ********************************************** 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 | TASK | External deployment step 1 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 | OK | External deployment step 1 | undercloud -> localhost | result={ "changed": false, "msg": "Use --start-at-task 'External deployment step 1' to resume from this task" } [WARNING]: ('undercloud -> localhost', '525400d4-7124-4a42-664c-0000000000a8') missing from stats 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 | TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 | INCLUDED | /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml | undercloud 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 | TASK | Set some tripleo-ansible facts 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 | OK | Set some tripleo-ansible facts | undercloud 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 | TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 | 0.03s 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 | TASK | Container image prepare 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 | FATAL | Container image prepare | *undercloud | error={"changed": false, "error": "None: Max retries exceeded with url: /v2/ (Caused by None)", "msg": "Error running container image prepare: None: Max retries exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}* 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 | TIMING | tripleo_container_image_prepare : Container image prepare | undercloud | 0:06:13.385607 | 72.12s

This gets failed at step 1, As this is wallaby and based on the document (Use an external Ceph cluster with the Overcloud — TripleO 3.0.0 documentation (openstack.org) <https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/ceph_external.html>) we should only pass this external-ceph.yaml for the external ceph intergration. But it is not happening.

Few things to note: 1. Container Prepare:

(undercloud) [stack@undercloud ~]$ cat containers-prepare-parameter.yaml # Generated with the following on 2022-06-28T18:56:38.642315 # # openstack tripleo container image prepare default --local-push-destination --output-env-file /home/stack/containers-prepare-parameter.yaml #

parameter_defaults: ContainerImagePrepare: - push_destination: true set: name_prefix: openstack- name_suffix: '' namespace: myserver.com:5000/tripleowallaby neutron_driver: ovn rhel_containers: false tag: current-tripleo tag_from_label: rdo_version (undercloud) [stack@undercloud ~]$

2. this is SSL based deployment.

Any idea for the error, the issue is seen only once we have the external ceph integration enabled.

Best Regards, Lokendra

On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano <fpantano@redhat.com> wrote:

> Hi, > ceph is supposed to be configured by this tripleo-ansible role [1], > which is triggered by tht on external_deploy_steps [2]. > In theory adding [3] should just work, assuming you customize the > ceph cluster mon ip addresses, fsid and a few other related variables. > From your previous email I suspect in your external-ceph.yaml you > missed the TripleO resource OS::TripleO::Services::CephExternal: > ../deployment/cephadm/ceph-client.yaml > (see [3]). > > Thanks, > Francesco > > > [1] > https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/rol... > [2] > https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/c... > [3] > https://github.com/openstack/tripleo-heat-templates/blob/master/environments... > > On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour < > lokendrarathour@gmail.com> wrote: > >> Hi Team, >> I was trying to integrate External Ceph with Triple0 Wallaby, and >> at the end of deployment in step4 getting the below error: >> >> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | >> Create containers from >> /var/lib/tripleo-config/container-startup-config/step_4 >> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | >> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | >> overcloud-controller-2 >> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | >> Create containers managed by Podman for >> /var/lib/tripleo-config/container-startup-config/step_4 >> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:24.530812 | | WARNING | >> ERROR: Can't run container nova_libvirt_init_secret >> stderr: >> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 >> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | >> Create containers managed by Podman for >> /var/lib/tripleo-config/container-startup-config/step_4 | >> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed >> containers: nova_libvirt_init_secret"} >> 2022-08-03 18:37:44,282 p=507732 u >> >> >> *external-ceph.conf:* >> >> parameter_defaults: >> # Enable use of RBD backend in nova-compute >> NovaEnableRbdBackend: True >> # Enable use of RBD backend in cinder-volume >> CinderEnableRbdBackend: True >> # Backend to use for cinder-backup >> CinderBackupBackend: ceph >> # Backend to use for glance >> GlanceBackend: rbd >> # Name of the Ceph pool hosting Nova ephemeral images >> NovaRbdPoolName: vms >> # Name of the Ceph pool hosting Cinder volumes >> CinderRbdPoolName: volumes >> # Name of the Ceph pool hosting Cinder backups >> CinderBackupRbdPoolName: backups >> # Name of the Ceph pool hosting Glance images >> GlanceRbdPoolName: images >> # Name of the user to authenticate with the external Ceph cluster >> CephClientUserName: admin >> # The cluster FSID >> CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' >> # The CephX user auth key >> CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' >> # The list of Ceph monitors >> CephExternalMonHost: >> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' >> ~ >> >> >> Have tried checking and validating the ceph client details and they >> seem to be correct, further digging the container log I could see something >> like this : >> >> [root@overcloud-novacompute-0 containers]# tail -f >> nova_libvirt_init_secret.log >> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No >> such file or directory >> tail: no files remaining >> [root@overcloud-novacompute-0 containers]# tail -f >> stdouts/nova_libvirt_init_secret.log >> 2022-08-04T11:48:47.689898197+05:30 stdout F >> ------------------------------------------------ >> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh >> secrets for: ceph:admin >> 2022-08-04T11:48:47.690590594+05:30 stdout F Error: >> /etc/ceph/ceph.conf was not found >> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to >> nova_libvirt_init_secret was ceph:admin >> 2022-08-04T16:20:29.643785538+05:30 stdout F >> ------------------------------------------------ >> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh >> secrets for: ceph:admin >> 2022-08-04T16:20:29.644785532+05:30 stdout F Error: >> /etc/ceph/ceph.conf was not found >> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to >> nova_libvirt_init_secret was ceph:admin >> ^C >> [root@overcloud-novacompute-0 containers]# tail -f >> stdouts/nova_compute_init_log.log >> >> -- >> ~ Lokendra >> skype: lokendrarathour >> >> >> > > -- > Francesco Pantano > GPG KEY: F41BD75C >

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

-- ~ Lokendra skype: lokendrarathour

Lokendra Rathour

26 Aug 26 Aug

7:57 p.m.

Hi John, It got resolved, reason was NTP. The NTP time was not in sync., i noticed thay recently the NTP is not getting configured properly on the controller and xompute nodes. After enabling thr time sync and validation we redeployed and it worked fine. I have another querry w.r.t to storage integration with a tripleo. We have noticed that only passing the external-ceph.yaml is not doing the deployment, we also need to pass ceph parameters in container-prepare. We did see some containers getting downloaded as well but after the deployment is done we do not see them anywhere. What can be the reason for such containers if not used ? Any point would help me further ensure 100% offline tripleO On Thu, 4 Aug 2022, 17:07 Lokendra Rathour, <lokendrarathour@gmail.com> wrote:

...

Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error:

2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u

*external-ceph.conf:*

parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~

Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this :

[root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log

-- ~ Lokendra skype: lokendrarathour

John Fulton

29 Aug 29 Aug

6:19 a.m.

On Fri, Aug 26, 2022 at 11:03 PM Lokendra Rathour <lokendrarathour@gmail.com> wrote:

...

Hi John, It got resolved, reason was NTP. The NTP time was not in sync., i noticed thay recently the NTP is not getting configured properly on the controller and xompute nodes. After enabling thr time sync and validation we redeployed and it worked fine.

I have another querry w.r.t to storage integration with a tripleo.

We have noticed that only passing the external-ceph.yaml is not doing the deployment, we also need to pass ceph parameters in container-prepare. We did see some containers getting downloaded as well but after the deployment is done we do not see them anywhere. What can be the reason for such containers if not used ? Any point would help me further ensure 100% offline tripleO

You'll need the ceph container if: 1. If you're using NFS Ganesha with external ceph 2. If you're using ceph-ansible with external ceph You should be using Wallaby however as per [1]. If you're only using RBD you shouldn't need the ceph container. This role should set up your ceph conf and key files. https://github.com/openstack/tripleo-ansible/tree/stable/wallaby/tripleo_ans... For offline tripleo, you need overcloud containers (regardless of if the ceph container is one of them). The solution to that problem is to use the undercloud as a container registry as per [2]. John [1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features... [2] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployme...

...

On Thu, 4 Aug 2022, 17:07 Lokendra Rathour, <lokendrarathour@gmail.com> wrote:

...
Hi Team, I was trying to integrate External Ceph with Triple0 Wallaby, and at the end of deployment in step4 getting the below error:

2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 | TASK | Create containers from /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b | INCLUDED | /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml | overcloud-controller-2 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 | TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.530812 | | WARNING | ERROR: Can't run container nova_libvirt_init_secret stderr: 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec | FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_4 | overcloud-novacompute-0 | error={"changed": false, "msg": "Failed containers: nova_libvirt_init_secret"} 2022-08-03 18:37:44,282 p=507732 u

*external-ceph.conf:*

parameter_defaults: # Enable use of RBD backend in nova-compute NovaEnableRbdBackend: True # Enable use of RBD backend in cinder-volume CinderEnableRbdBackend: True # Backend to use for cinder-backup CinderBackupBackend: ceph # Backend to use for glance GlanceBackend: rbd # Name of the Ceph pool hosting Nova ephemeral images NovaRbdPoolName: vms # Name of the Ceph pool hosting Cinder volumes CinderRbdPoolName: volumes # Name of the Ceph pool hosting Cinder backups CinderBackupRbdPoolName: backups # Name of the Ceph pool hosting Glance images GlanceRbdPoolName: images # Name of the user to authenticate with the external Ceph cluster CephClientUserName: admin # The cluster FSID CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c' # The CephX user auth key CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A==' # The list of Ceph monitors CephExternalMonHost: 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13' ~

Have tried checking and validating the ceph client details and they seem to be correct, further digging the container log I could see something like this :

[root@overcloud-novacompute-0 containers]# tail -f nova_libvirt_init_secret.log tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such file or directory tail: no files remaining [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_libvirt_init_secret.log 2022-08-04T11:48:47.689898197+05:30 stdout F ------------------------------------------------ 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T11:48:47.690590594+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T11:48:47.690625088+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin 2022-08-04T16:20:29.643785538+05:30 stdout F ------------------------------------------------ 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh secrets for: ceph:admin 2022-08-04T16:20:29.644785532+05:30 stdout F Error: /etc/ceph/ceph.conf was not found 2022-08-04T16:20:29.644785532+05:30 stdout F Path to nova_libvirt_init_secret was ceph:admin ^C [root@overcloud-novacompute-0 containers]# tail -f stdouts/nova_compute_init_log.log

-- ~ Lokendra skype: lokendrarathour

1069

Age (days ago)

1094

Last active (days ago)

List overview

Download

10 comments

3 participants

participants (3)

Francesco Pantano
John Fulton
Lokendra Rathour

[Triple0] [Wallaby] External Ceph Integration getting failed

tags

participants (3)