[Triple0] [Wallaby] External Ceph Integration getting failed

Lokendra Rathour lokendrarathour at gmail.com
Thu Aug 25 13:04:31 UTC 2022


Hi John,
Thanks for the inputs. Now I see something strange.
Deployment with external ceph is unstable, it got deployed once and we saw
an error of VM not getting created because of some reasons, we were
debugging and found that we found some NTP related observation, which we
fixed and tried redeploying.
Not it again got failed at step 4:

2022-08-25 17:34:29.036371 | 5254004d-021e-d4db-067d-000000007b1a |       TASK
| Create identity internal endpoint
2022-08-25 17:34:31.176105 | 5254004d-021e-d4db-067d-000000007b1a |      FATAL
| Create identity internal endpoint | undercloud | error={"changed": false,
"extra_data": {"data": null, "details": "The request you have made requires
authentication.", "response": "{\"error\":{\"code\":401,\"message\":\"The
request you have made requires authentication.\",\"title\":\"Unauthorized\"}
}\n"}, "msg": "Failed to list services: Client Error for url:
https://overcloud-public.mydomain.com:13000/v3/services
<https://overcloud-public.myhsc.com:13000/v3/services>, The request you
have made requires authentication."}

To revalidate the case, I tried a fresh setup and saw that deployment again
failed at step 4.
and when we remove external ceph from the deployment command, we see that
deployment is happening 100%.
I see authorization errors, which I used to get earlier as well,  but
because of DNS we were able to resolve this.
what could be the reason for this when we are using External ceph ?

any inputs would be helpful

deploy command:

stack at undercloud ~]$ cat deploy_step2.sh
openstack overcloud deploy --templates \
    -r /home/stack/templates/roles_data.yaml \
    -n /home/stack/templates/custom_network_data.yaml \
    -e /home/stack/templates/overcloud-baremetal-deployed.yaml \
    -e /home/stack/templates/networks-deployed-environment.yaml \
    -e /home/stack/templates/vip-deployed-environment.yaml \
    -e /home/stack/templates/environment.yaml \
    -e
/usr/share/openstack-tripleo-heat-templates/environments/services/ironic-conductor.yaml
\
    -e
/usr/share/openstack-tripleo-heat-templates/environments/services/ironic-inspector.yaml
\
    -e
/usr/share/openstack-tripleo-heat-templates/environments/services/ironic-overcloud.yaml
\
    -e /home/stack/templates/ironic-config.yaml \
    -e
/usr/share/openstack-tripleo-heat-templates/environments/services/ptp.yaml \
    -e /home/stack/templates/enable-tls.yaml \
    -e
/usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml
\
    -e /home/stack/templates/cloudname.yaml \
    -e
/usr/share/openstack-tripleo-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml
\
    -e
/usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml
\
    -e /home/stack/templates/my-additional-ceph-settings.yaml \
    -e
/usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml
\
    -e /home/stack/containers-prepare-parameter.yaml


[stack at undercloud ~]$ cat
/usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml
resource_registry:
  OS::TripleO::Services::CephExternal:
../deployment/cephadm/ceph-client.yaml

parameter_defaults:
  # NOTE: These example parameters are required when using CephExternal
  CephClusterFSID: 'ca3080e3-aa3a-4d1a-b1fd-483459a9ea4c'
  CephClientKey: 'AQB2hMZi2u13NxAAVjmKopw+kNm6OnZOG7NktQ=='
  CephExternalMonHost:
'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13'

  # the following parameters enable Ceph backends for Cinder, Glance,
Gnocchi and Nova
  NovaEnableRbdBackend: true
  CinderEnableRbdBackend: true
  CinderBackupBackend: ceph
  GlanceBackend: rbd
  # Uncomment below if enabling legacy telemetry
  # GnocchiBackend: rbd
  # If the Ceph pools which host VMs, Volumes and Images do not match these
  # names OR the client keyring to use is not named 'openstack',  edit the
  # following as needed.
  NovaRbdPoolName: vms
  CinderRbdPoolName: volumes
  CinderBackupRbdPoolName: backups
  GlanceRbdPoolName: images
  # Uncomment below if enabling legacy telemetry
  # GnocchiRbdPoolName: metrics
  CephClientUserName: openstack

  # finally we disable the Cinder LVM backend
  CinderEnableIscsiBackend: false


On Fri, Aug 19, 2022 at 10:32 PM John Fulton <johfulto at redhat.com> wrote:

> On Fri, Aug 19, 2022 at 3:45 AM Lokendra Rathour <
> lokendrarathour at gmail.com> wrote:
>
>> Hi Fulton,
>> Thanks for the inputs and apologies for the delay in response.
>> to my surprise passing the container prepare in standard worked for me,
>> new container-prepare is:
>>
>> parameter_defaults:
>>   ContainerImagePrepare:
>>   - push_destination: true
>>     set:
>>       ceph_alertmanager_image: alertmanager
>>       ceph_alertmanager_namespace: quay.ceph.io/prometheus
>>       ceph_alertmanager_tag: v0.16.2
>>       ceph_grafana_image: grafana
>>       ceph_grafana_namespace: quay.ceph.io/app-sre
>>       ceph_grafana_tag: 6.7.4
>>       ceph_image: daemon
>>       ceph_namespace: quay.io/ceph
>>       ceph_node_exporter_image: node-exporter
>>       ceph_node_exporter_namespace: quay.ceph.io/prometheus
>>       ceph_node_exporter_tag: v0.17.0
>>       ceph_prometheus_image: prometheus
>>       ceph_prometheus_namespace: quay.ceph.io/prometheus
>>       ceph_prometheus_tag: v2.7.2
>>       ceph_tag: v6.0.7-stable-6.0-pacific-centos-stream8
>>       name_prefix: openstack-
>>       name_suffix: ''
>>       namespace: myserver.com:5000/tripleowallaby
>>       neutron_driver: ovn
>>       rhel_containers: false
>>       tag: current-tripleo
>>     tag_from_label: rdo_version
>>
>> But if we see or look at these containers I do not see any such
>> containers available. we have tried looking at Undercloud and overcloud.
>>
>
> The undercloud can download continers from the sources above and then act
> as a container registry. It's described here:
>
>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/container_image_prepare.html
>
>
>> Also, the deployment is done when we are passing this config.
>> Thanks once again.
>>
>> Also, we need to understand some use cases of using the storage from this
>> external ceph, which can work as the mount for the VM as direct or Shared
>> storage. Any idea or available document which tells more about how to
>> consume external Ceph in the existing triple Overcloud?
>>
>
> Ceph can provide OpenStack Block, Object and File storage and TripleO
> supports a variety of integration options for them.
>
> TripleO can deploy Ceph as part of the OpenStack overcloud:
>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html
>
> TripleO can also deploy an OpenStack overcloud which uses an existing
> external ceph cluster:
>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_external.html
>
> At the end of both of these documents you can expect Glance, Nova, and
> Cinder to use Ceph block storage (RBD).
>
> You can also have OpenStack use Ceph object storage (RGW). When RGW is
> used, a command like "openstack container create foo" will create an object
> storage container (not to be confused with podman/docker) on CephRGW as if
> your overcloud were running OpenStack Swift. If you have TripleO deploy
> Ceph as part of the OpenStack overcloud, RGW will be deployed and
> configured for OpenStack object storage by default (in Wallaby+).
>
> The OpenStack Manila service can use CephFS as one of its backends.
> TripleO can deploy that too as described here:
>
>
> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_manila.html
>
>   John
>
>
>> Do share in case you know any, please.
>>
>> Thanks once again for the support, it was really helpful
>>
>>
>> On Thu, Aug 11, 2022 at 9:59 PM John Fulton <johfulto at redhat.com> wrote:
>>
>>> The ceph container should no longer be needed for external ceph
>>> configuration (since the move from ceph-ansible to cephadm) but if removing
>>> the ceph env files makes the error go away,  then try adding it back and
>>> then following these steps to prepare the ceph container on your undercloud
>>> before deploying.
>>>
>>>
>>> https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deployed_ceph.html#container-options
>>>
>>> On Wed, Aug 10, 2022, 11:48 PM Lokendra Rathour <
>>> lokendrarathour at gmail.com> wrote:
>>>
>>>> Hi Thanks,
>>>> for the inputs, we could see the miss,
>>>> now we have added the required miss :
>>>> "TripleO resource
>>>> OS::TripleO::Services::CephExternal: ../deployment/cephadm/ceph-client.yaml"
>>>>
>>>> Now with this setting if we deploy the setup in wallaby, we are
>>>> getting error as:
>>>>
>>>>
>>>> PLAY [External deployment step 1]
>>>> **********************************************
>>>> 2022-08-11 08:33:20.183104 | 525400d4-7124-4a42-664c-0000000000a8 |
>>>>   TASK | External deployment step 1
>>>> 2022-08-11 08:33:20.211821 | 525400d4-7124-4a42-664c-0000000000a8 |
>>>>     OK | External deployment step 1 | undercloud -> localhost | result={
>>>>     "changed": false,
>>>>     "msg": "Use --start-at-task 'External deployment step 1' to resume
>>>> from this task"
>>>> }
>>>> [WARNING]: ('undercloud -> localhost',
>>>> '525400d4-7124-4a42-664c-0000000000a8')
>>>> missing from stats
>>>> 2022-08-11 08:33:20.254775 | 525400d4-7124-4a42-664c-0000000000a9 |
>>>> TIMING | include_tasks | undercloud | 0:05:01.151528 | 0.03s
>>>> 2022-08-11 08:33:20.304290 | 730cacb3-fa5a-4dca-9730-9a8ce54fb5a3 |
>>>> INCLUDED |
>>>> /home/stack/overcloud-deploy/overcloud/config-download/overcloud/external_deploy_steps_tasks_step1.yaml
>>>> | undercloud
>>>> 2022-08-11 08:33:20.322079 | 525400d4-7124-4a42-664c-0000000048d0 |
>>>>   TASK | Set some tripleo-ansible facts
>>>> 2022-08-11 08:33:20.350423 | 525400d4-7124-4a42-664c-0000000048d0 |
>>>>     OK | Set some tripleo-ansible facts | undercloud
>>>> 2022-08-11 08:33:20.351792 | 525400d4-7124-4a42-664c-0000000048d0 |
>>>> TIMING | Set some tripleo-ansible facts | undercloud | 0:05:01.248558 |
>>>> 0.03s
>>>> 2022-08-11 08:33:20.366717 | 525400d4-7124-4a42-664c-0000000048d7 |
>>>>   TASK | Container image prepare
>>>> 2022-08-11 08:34:32.486108 | 525400d4-7124-4a42-664c-0000000048d7 |
>>>>  FATAL | Container image prepare | *undercloud | error={"changed":
>>>> false, "error": "None: Max retries exceeded with url: /v2/ (Caused by
>>>> None)", "msg": "Error running container image prepare: None: Max retries
>>>> exceeded with url: /v2/ (Caused by None)", "params": {}, "success": false}*
>>>> 2022-08-11 08:34:32.488845 | 525400d4-7124-4a42-664c-0000000048d7 |
>>>> TIMING | tripleo_container_image_prepare : Container image prepare |
>>>> undercloud | 0:06:13.385607 | 72.12s
>>>>
>>>> This gets failed at step 1, As this is wallaby and based on the
>>>> document (Use an external Ceph cluster with the Overcloud — TripleO
>>>> 3.0.0 documentation (openstack.org)
>>>> <https://docs.openstack.org/project-deploy-guide/tripleo-docs/wallaby/features/ceph_external.html>)
>>>> we should only pass this external-ceph.yaml for the external ceph
>>>> intergration.
>>>> But it is not happening.
>>>>
>>>>
>>>> Few things to note:
>>>> 1. Container Prepare:
>>>>
>>>> (undercloud) [stack at undercloud ~]$ cat
>>>> containers-prepare-parameter.yaml
>>>> # Generated with the following on 2022-06-28T18:56:38.642315
>>>> #
>>>> #   openstack tripleo container image prepare default
>>>> --local-push-destination --output-env-file
>>>> /home/stack/containers-prepare-parameter.yaml
>>>> #
>>>>
>>>>
>>>> parameter_defaults:
>>>>   ContainerImagePrepare:
>>>>   - push_destination: true
>>>>     set:
>>>>       name_prefix: openstack-
>>>>       name_suffix: ''
>>>>       namespace: myserver.com:5000/tripleowallaby
>>>>       neutron_driver: ovn
>>>>       rhel_containers: false
>>>>       tag: current-tripleo
>>>>     tag_from_label: rdo_version
>>>> (undercloud) [stack at undercloud ~]$
>>>>
>>>> 2. this is SSL based deployment.
>>>>
>>>> Any idea for the error, the issue is seen only once we have the
>>>> external ceph integration enabled.
>>>>
>>>> Best Regards,
>>>> Lokendra
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 7:22 PM Francesco Pantano <fpantano at redhat.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> ceph is supposed to be configured by this tripleo-ansible role [1],
>>>>> which is triggered by tht on external_deploy_steps [2].
>>>>> In theory adding [3] should just work, assuming you customize the ceph
>>>>> cluster mon ip addresses, fsid and a few other related variables.
>>>>> From your previous email I suspect in your external-ceph.yaml you
>>>>> missed the TripleO resource OS::TripleO::Services::CephExternal:
>>>>> ../deployment/cephadm/ceph-client.yaml
>>>>> (see [3]).
>>>>>
>>>>> Thanks,
>>>>> Francesco
>>>>>
>>>>>
>>>>> [1]
>>>>> https://github.com/openstack/tripleo-ansible/tree/master/tripleo_ansible/roles/tripleo_ceph_client
>>>>> [2]
>>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/cephadm/ceph-client.yaml#L93
>>>>> [3]
>>>>> https://github.com/openstack/tripleo-heat-templates/blob/master/environments/external-ceph.yaml
>>>>>
>>>>> On Thu, Aug 4, 2022 at 2:01 PM Lokendra Rathour <
>>>>> lokendrarathour at gmail.com> wrote:
>>>>>
>>>>>> Hi Team,
>>>>>> I was trying to integrate External Ceph with Triple0 Wallaby, and at
>>>>>> the end of deployment in step4 getting the below error:
>>>>>>
>>>>>> 2022-08-03 18:37:21,158 p=507732 u=stack n=ansible | 2022-08-03
>>>>>> 18:37:21.157962 | 525400fe-86b8-65d9-d100-0000000080d2 |       TASK |
>>>>>> Create containers from
>>>>>> /var/lib/tripleo-config/container-startup-config/step_4
>>>>>> 2022-08-03 18:37:21,239 p=507732 u=stack n=ansible | 2022-08-03
>>>>>> 18:37:21.238718 | 69e98219-f748-4af7-a6d0-f8f73680ce9b |   INCLUDED |
>>>>>> /usr/share/ansible/roles/tripleo_container_manage/tasks/create.yml |
>>>>>> overcloud-controller-2
>>>>>> 2022-08-03 18:37:21,273 p=507732 u=stack n=ansible | 2022-08-03
>>>>>> 18:37:21.272340 | 525400fe-86b8-65d9-d100-0000000086d9 |       TASK |
>>>>>> Create containers managed by Podman for
>>>>>> /var/lib/tripleo-config/container-startup-config/step_4
>>>>>> 2022-08-03 18:37:24,532 p=507732 u=stack n=ansible | 2022-08-03
>>>>>> 18:37:24.530812 |                                      |    WARNING |
>>>>>> ERROR: Can't run container nova_libvirt_init_secret
>>>>>> stderr:
>>>>>> 2022-08-03 18:37:24,533 p=507732 u=stack n=ansible | 2022-08-03
>>>>>> 18:37:24.532811 | 525400fe-86b8-65d9-d100-0000000082ec |      FATAL |
>>>>>> Create containers managed by Podman for
>>>>>> /var/lib/tripleo-config/container-startup-config/step_4 |
>>>>>> overcloud-novacompute-0 | error={"changed": false, "msg": "Failed
>>>>>> containers: nova_libvirt_init_secret"}
>>>>>> 2022-08-03 18:37:44,282 p=507732 u
>>>>>>
>>>>>>
>>>>>> *external-ceph.conf:*
>>>>>>
>>>>>> parameter_defaults:
>>>>>>   # Enable use of RBD backend in nova-compute
>>>>>>   NovaEnableRbdBackend: True
>>>>>>   # Enable use of RBD backend in cinder-volume
>>>>>>   CinderEnableRbdBackend: True
>>>>>>   # Backend to use for cinder-backup
>>>>>>   CinderBackupBackend: ceph
>>>>>>   # Backend to use for glance
>>>>>>   GlanceBackend: rbd
>>>>>>   # Name of the Ceph pool hosting Nova ephemeral images
>>>>>>   NovaRbdPoolName: vms
>>>>>>   # Name of the Ceph pool hosting Cinder volumes
>>>>>>   CinderRbdPoolName: volumes
>>>>>>   # Name of the Ceph pool hosting Cinder backups
>>>>>>   CinderBackupRbdPoolName: backups
>>>>>>   # Name of the Ceph pool hosting Glance images
>>>>>>   GlanceRbdPoolName: images
>>>>>>   # Name of the user to authenticate with the external Ceph cluster
>>>>>>   CephClientUserName: admin
>>>>>>   # The cluster FSID
>>>>>>   CephClusterFSID: 'ca3080-aaaa-4d1a-b1fd-4aaaa9a9ea4c'
>>>>>>   # The CephX user auth key
>>>>>>   CephClientKey: 'AQDgRjhiuLMnAxAAnYwgERERFy0lzH6ufSl70A=='
>>>>>>   # The list of Ceph monitors
>>>>>>   CephExternalMonHost:
>>>>>> 'abcd:abcd:abcd::11,abcd:abcd:abcd::12,abcd:abcd:abcd::13'
>>>>>> ~
>>>>>>
>>>>>>
>>>>>> Have tried checking and validating the ceph client details and they
>>>>>> seem to be correct, further digging the container log I could see something
>>>>>> like this :
>>>>>>
>>>>>> [root at overcloud-novacompute-0 containers]# tail -f
>>>>>> nova_libvirt_init_secret.log
>>>>>> tail: cannot open 'nova_libvirt_init_secret.log' for reading: No such
>>>>>> file or directory
>>>>>> tail: no files remaining
>>>>>> [root at overcloud-novacompute-0 containers]# tail -f
>>>>>> stdouts/nova_libvirt_init_secret.log
>>>>>> 2022-08-04T11:48:47.689898197+05:30 stdout F
>>>>>> ------------------------------------------------
>>>>>> 2022-08-04T11:48:47.690002011+05:30 stdout F Initializing virsh
>>>>>> secrets for: ceph:admin
>>>>>> 2022-08-04T11:48:47.690590594+05:30 stdout F Error:
>>>>>> /etc/ceph/ceph.conf was not found
>>>>>> 2022-08-04T11:48:47.690625088+05:30 stdout F Path to
>>>>>> nova_libvirt_init_secret was ceph:admin
>>>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F
>>>>>> ------------------------------------------------
>>>>>> 2022-08-04T16:20:29.643785538+05:30 stdout F Initializing virsh
>>>>>> secrets for: ceph:admin
>>>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Error:
>>>>>> /etc/ceph/ceph.conf was not found
>>>>>> 2022-08-04T16:20:29.644785532+05:30 stdout F Path to
>>>>>> nova_libvirt_init_secret was ceph:admin
>>>>>> ^C
>>>>>> [root at overcloud-novacompute-0 containers]# tail -f
>>>>>> stdouts/nova_compute_init_log.log
>>>>>>
>>>>>> --
>>>>>> ~ Lokendra
>>>>>> skype: lokendrarathour
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Francesco Pantano
>>>>> GPG KEY: F41BD75C
>>>>>
>>>>
>>>>
>>>> --
>>>> ~ Lokendra
>>>> skype: lokendrarathour
>>>>
>>>>
>>>>
>>
>> --
>> ~ Lokendra
>> skype: lokendrarathour
>>
>>
>>

-- 
~ Lokendra
skype: lokendrarathour
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220825/b1a4a61d/attachment-0001.htm>


More information about the openstack-discuss mailing list