[tripleo][ansible-ceph[ussuri][rdo][centos8] fails on ansible-ceph execution.

John Fulton johfulto at redhat.com
Mon Sep 21 17:51:39 UTC 2020


Just wanted to share a few observations from your
https://github.com/qw3r3wq/OSP-ussuri/blob/master/v3/node-info.yaml

1. Your mon_max_pg_per_osd should be closer to 100 or 200.

You have it set at 4k:

  CephConfigOverrides:
    global:
      mon_max_pg_per_osd: 4096

Maybe you set this to workaround
https://ceph.com/community/new-luminous-pg-overdose-protection/ but
this is not a good way to do it for any production data. This check
was added to avoid setting this value too high so working around it
increases the chances you can have the problems the check was made to
avoid. I assume this is just a test cluster (1 mon) but I wanted to
let you know.

2. Replicas

If you only have one OSD node you need to set "CephPoolDefaultSize: 1"
(that should help you with the pg overdose issue too).

3. metrics pool

If you're deploying with telemetry disabled then you don't need a metrics pool.

4. Backend overrides

You shouldn't need GlanceBackend: rbd, GnocchiBackend: rbd, or
NovaEnableRbdBackend: true as that gets set by default by using the
ceph-ansible env file we've been talking about.

5. DistributedComputeHCICount role

This role is meant to be used with distributed compute nodes which
don't run in the same stack as the controller node. They are meant to
be used as described in [1] I think the ComputeHCI node would be a
better role to deploy in the same stack as the Controller. Not saying
you can't do this but it doesn't look like you're using the role for
what it was designed for so I at least wanted to point that out.

[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_multibackend_storage.html



  John

On Mon, Sep 21, 2020 at 1:29 PM John Fulton <johfulto at redhat.com> wrote:
>
> On Mon, Sep 21, 2020 at 1:05 PM Ruslanas Gžibovskis <ruslanas at lpic.lt> wrote:
> >
> > Also another thing, cat ./ceph-ansible/group_vars/osds.yml
> > looks that have not been modified over last re-deployments. delete'ing it again and removing config-download and everything from swift...
>
> The tripleo-ansible role tripleo_ceph_work_dir will manage that
> directory for you (recreate it when needed to reflect what is in
> Heat). It is run when config-download is run.
>
> https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_ceph_work_dir
>
> > I do not like it do not override everything... especially when launching deployment, when there is no stack (I mean in undercloud host, as overcloud nodes should be cleaned up by undercloud).
>
> If there is no stack, the stack will be created when you deploy and
> config-download's directory of playbooks will also be recreated. You
> shouldn't need to worry about cleaning up the existing config-download
> directory. You can, but you don't have to.
>
>  https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/ansible_config_download.html#ansible-project-directory
>
>   John
>
> > Thank you, will keep updated.
> >
> > On Mon, 21 Sep 2020 at 19:33, Ruslanas Gžibovskis <ruslanas at lpic.lt> wrote:
> >>
> >> I have one thought.
> >>
> >> stack at undercloudv3 v3]$ cat /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
> >> resource_registry:
> >>  OS::TripleO::Services::CephMgr: ../../deployment/ceph-ansible/ceph-mgr.yaml
> >>  OS::TripleO::Services::CephMon: ../../deployment/ceph-ansible/ceph-mon.yaml
> >>  OS::TripleO::Services::CephOSD: ../../deployment/ceph-ansible/ceph-osd.yaml
> >>  OS::TripleO::Services::CephClient: ../../deployment/ceph-ansible/ceph-client.yaml
> >>
> >> parameter_defaults:
> >>  # Ensure that if user overrides CephAnsiblePlaybook via some env
> >>  # file, we go back to default when they stop passing their env file.
> >>  CephAnsiblePlaybook: ['default']
> >>
> >>  CinderEnableIscsiBackend: false
> >>  CinderEnableRbdBackend: true
> >>  CinderBackupBackend: ceph
> >>  NovaEnableRbdBackend: true
> >>  GlanceBackend: rbd
> >>  ## Uncomment below if enabling legacy telemetry
> >>  # GnocchiBackend: rbd
> >> [stack at undercloudv3 v3]$
> >>
> >>
> >> And my deploy has:
> >>    -e ${_THT}/environments/ceph-ansible/ceph-ansible.yaml \
> >>    -e ${_THT}/environments/ceph-ansible/ceph-rgw.yaml \
> >>    -e ${_THT}/environments/ceph-ansible/ceph-mds.yaml \
> >>    -e ${_THT}/environments/ceph-ansible/ceph-dashboard.yaml \
> >>
> >> generally the same files, BUT, they are specified by user, and it "might feel like" the user overwrote default settings?
> >>
> >> Also I am thinking on the things you helped me tho find, John. And I recalled, what I have found strange. NFS part.
> >> That it was trying to configure CephNfs... Or it should even I do not have it specified? From the output [1] here is the small part of it:
> >>         "statically imported: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/create_rgw_nfs_user.yml",
> >>         "statically imported: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/ganesha_selinux_fix.yml",
> >>         "statically imported: /usr/share/ceph-ansible/roles/ceph-nfs/tasks/start_nfs.yml",
> >>
> >>
> >> [1] https://proxy.qwq.lt/ceph-ansible.html
> >>
> >
> >
> > --
> > Ruslanas Gžibovskis
> > +370 6030 7030




More information about the openstack-discuss mailing list