Re: [tripleo][ansible-ceph[ussuri][rdo][centos8] fails on ansible-ceph execution.

21 Sep 2020

      Just wanted to share a few observations from your
https://github.com/qw3r3wq/OSP-ussuri/blob/master/v3/node-info.yaml

1. Your mon_max_pg_per_osd should be closer to 100 or 200.

You have it set at 4k:

  CephConfigOverrides:
    global:
      mon_max_pg_per_osd: 4096

Maybe you set this to workaround
https://ceph.com/community/new-luminous-pg-overdose-protection/ but
this is not a good way to do it for any production data. This check
was added to avoid setting this value too high so working around it
increases the chances you can have the problems the check was made to
avoid. I assume this is just a test cluster (1 mon) but I wanted to
let you know.

2. Replicas

If you only have one OSD node you need to set "CephPoolDefaultSize: 1"
(that should help you with the pg overdose issue too).

3. metrics pool

If you're deploying with telemetry disabled then you don't need a metrics pool.

4. Backend overrides

You shouldn't need GlanceBackend: rbd, GnocchiBackend: rbd, or
NovaEnableRbdBackend: true as that gets set by default by using the
ceph-ansible env file we've been talking about.

5. DistributedComputeHCICount role

This role is meant to be used with distributed compute nodes which
don't run in the same stack as the controller node. They are meant to
be used as described in [1] I think the ComputeHCI node would be a
better role to deploy in the same stack as the Controller. Not saying
you can't do this but it doesn't look like you're using the role for
what it was designed for so I at least wanted to point that out.

[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features...

  John

On Mon, Sep 21, 2020 at 1:29 PM John Fulton <johfulto@redhat.com> wrote:
...
On Mon, Sep 21, 2020 at 1:05 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
...
Also another thing, cat ./ceph-ansible/group_vars/osds.yml
looks that have not been modified over last re-deployments. delete'ing it again and removing config-download and everything from swift...
The tripleo-ansible role tripleo_ceph_work_dir will manage that
directory for you (recreate it when needed to reflect what is in
Heat). It is run when config-download is run.
https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansi...
...
I do not like it do not override everything... especially when launching deployment, when there is no stack (I mean in undercloud host, as overcloud nodes should be cleaned up by undercloud).
If there is no stack, the stack will be created when you deploy and
config-download's directory of playbooks will also be recreated. You
shouldn't need to worry about cleaning up the existing config-download
directory. You can, but you don't have to.
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployme...
John
...
Thank you, will keep updated.
On Mon, 21 Sep 2020 at 19:33, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
...
I have one thought.
stack@undercloudv3 v3]$ cat /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
resource_registry:
 OS::TripleO::Services::CephMgr: ../../deployment/ceph-ansible/ceph-mgr.yaml
 OS::TripleO::Services::CephMon: ../../deployment/ceph-ansible/ceph-mon.yaml
 OS::TripleO::Services::CephOSD: ../../deployment/ceph-ansible/ceph-osd.yaml
 OS::TripleO::Services::CephClient: ../../deployment/ceph-ansible/ceph-client.yaml
parameter_defaults:
 # Ensure that if user overrides CephAnsiblePlaybook via some env
 # file, we go back to default when they stop passing their env file.
 CephAnsiblePlaybook: ['default']
CinderEnableIscsiBackend: false
 CinderEnableRbdBackend: true
 CinderBackupBackend: ceph
 NovaEnableRbdBackend: true
 GlanceBackend: rbd
 ## Uncomment below if enabling legacy telemetry
 # GnocchiBackend: rbd
[stack@undercloudv3 v3]$
And my deploy has:
   -e ${_THT}/environments/ceph-ansible/ceph-ansible.yaml \
   -e ${_THT}/environments/ceph-ansible/ceph-rgw.yaml \
   -e ${_THT}/environments/ceph-ansible/ceph-mds.yaml \
   -e ${_THT}/environments/ceph-ansible/ceph-dashboard.yaml \
generally the same files, BUT, they are specified by user, and it "might feel like" the user overwrote default settings?
Also I am thinking on the things you helped me tho find, John. And I recalled, what I have found strange. NFS part.
That it was trying to configure CephNfs... Or it should even I do not have it specified? From the output [1] here is the small part of it:
        "statically imported: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/create_rgw_nfs_user.yml",
        "statically imported: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/ganesha_selinux_fix.yml",
        "statically imported: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/start_nfs.yml",
[1] https://proxy.qwq.lt/ceph-ansible.html
--
Ruslanas Gžibovskis
+370 6030 7030