[openstack-ansible][ceph][yoga] wait for all osd to be up

Jonathan Rosser jonathan.rosser at rd.bbc.co.uk
Wed Aug 31 08:28:39 UTC 2022


For deploying ceph, Openstack-Ansible is just a thin wrapper around 
ceph-ansible (see 
https://docs.ceph.com/projects/ceph-ansible/en/latest/index.html).

You have to define the variables that ceph-ansible requires.

We have a test scenario for Openstack-Ansible + Ceph, which uses the 
following variables 
https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/templates/user_variables_ceph.yml.j2. 
Most of those are used in the ceph-ansible roles, not Openstack-Ansible 
directly. For the purposes of that test case LVM loopback devices are 
set up and a suitable ceph.conf is written out here 
https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/tasks/prepare_ceph.yml

If you wish to have Openstack-Ansible call the ceph-ansible roles for 
you to deploy ceph then you must take the time to understand 
ceph-ansible sufficiently to set the variables it requires to deploy 
correctly in your situation. Openstack-Ansible does not manage this for you.

It is also possible to independently deploy ceph using whatever means 
you like outside of openstack-ansible, and pass a very small amount of 
data to provide an integration between the two. Those options are 
described briefly here 
https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html 
and 
https://docs.openstack.org/openstack-ansible-ceph_client/latest/configure-ceph.html

Jonathan.

On 23/08/2022 00:53, Father Vlasie wrote:
> I have done a bit more searching…the error is related to the _reporting_ on the OSDs. I tried to get some info from journalctl while the infrasrtucture playbook was running and all I could see was this:
>
> Aug 22 22:11:31 compute3 python3[57496]: ansible-ceph_volume Invoked with cluster=ceph action=list objectstore=bluestore dmcrypt=False batch_devices=[] osds_per_device=1 journal_size=5120 journal_devices=[] block_db_size=-1 block_db_devices=[] wal_devices=[] report=False destroy=True data=None data_vg=None journal=None journal_vg=None db=None db_vg=None wal=None wal_vg=None crush_device_class=None osd_fsid=None osd_id=None
> Aug 22 22:12:01 compute3 audit[57503]: USER_ACCT pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_permit acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
> Aug 22 22:12:01 compute3 audit[57503]: CRED_ACQ pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_permit,pam_cap acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
> Aug 22 22:12:01 compute3 audit[57503]: SYSCALL arch=c000003e syscall=1 success=yes exit=1 a0=7 a1=7ffe656d1100 a2=1 a3=7fe9c3d53371 items=0 ppid=1725 pid=57503 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1445 comm="cron" exe="/usr/sbin/cron" key=(null)
> Aug 22 22:12:01 compute3 audit: PROCTITLE proctitle=2F7573722F7362696E2F43524F4E002D66
> Aug 22 22:12:01 compute3 CRON[57503]: pam_unix(cron:session): session opened for user root by (uid=0)
>
> The only thing that stands out to me is that there are no devices listed but in all of the openstack-ansible ceph documentation devices are never mentioned so I assume they are being detected automatically, is that right?
>
> Thank you,
>
> FV
>
>> On Aug 22, 2022, at 1:08 PM, Father Vlasie <fv at spots.edu> wrote:
>>
>>
>> Hello everyone,
>>
>> I am running setup-infrastucture.yml. I have followed the ceph production example here: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.html
>>
>> I have set things up so the compute and storage nodes are the same machine (hyperconverged). And the storage devices are devoid of any volumes or partitions.
>>
>> I see the following error:
>>
>> ------
>>
>> FAILED - RETRYING: [compute3 -> infra1_ceph-mon_container-0d679d8d]: wait for all osd to be up (1 retries left).
>> fatal: [compute3 -> infra1_ceph-mon_container-0d679d8d(192.168.3.145)]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["ceph", "--cluster", "ceph", "osd", "stat", "-f", "json"], "delta": "0:00:00.223291", "end": "2022-08-22 19:36:29.473358", "msg": "", "rc": 0, "start": "2022-08-22 19:36:29.250067", "stderr": "", "stderr_lines": [], "stdout": "\n{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}", "stdout_lines": ["", "{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}”]}
>>
>> ------
>>
>> I am not sure where to look to find more information. Any help would be much appreciated!
>>
>> Thank you,
>>
>> FV
>
>



More information about the openstack-discuss mailing list