[openstack-ansible][ceph][yoga] wait for all osd to be up
Hello everyone, I am running setup-infrastucture.yml. I have followed the ceph production example here: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.ht... I have set things up so the compute and storage nodes are the same machine (hyperconverged). And the storage devices are devoid of any volumes or partitions. I see the following error: ------ FAILED - RETRYING: [compute3 -> infra1_ceph-mon_container-0d679d8d]: wait for all osd to be up (1 retries left). fatal: [compute3 -> infra1_ceph-mon_container-0d679d8d(192.168.3.145)]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["ceph", "--cluster", "ceph", "osd", "stat", "-f", "json"], "delta": "0:00:00.223291", "end": "2022-08-22 19:36:29.473358", "msg": "", "rc": 0, "start": "2022-08-22 19:36:29.250067", "stderr": "", "stderr_lines": [], "stdout": "\n{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}", "stdout_lines": ["", "{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}”]} ------ I am not sure where to look to find more information. Any help would be much appreciated! Thank you, FV
I have done a bit more searching…the error is related to the _reporting_ on the OSDs. I tried to get some info from journalctl while the infrasrtucture playbook was running and all I could see was this: Aug 22 22:11:31 compute3 python3[57496]: ansible-ceph_volume Invoked with cluster=ceph action=list objectstore=bluestore dmcrypt=False batch_devices=[] osds_per_device=1 journal_size=5120 journal_devices=[] block_db_size=-1 block_db_devices=[] wal_devices=[] report=False destroy=True data=None data_vg=None journal=None journal_vg=None db=None db_vg=None wal=None wal_vg=None crush_device_class=None osd_fsid=None osd_id=None Aug 22 22:12:01 compute3 audit[57503]: USER_ACCT pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_permit acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success' Aug 22 22:12:01 compute3 audit[57503]: CRED_ACQ pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_permit,pam_cap acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success' Aug 22 22:12:01 compute3 audit[57503]: SYSCALL arch=c000003e syscall=1 success=yes exit=1 a0=7 a1=7ffe656d1100 a2=1 a3=7fe9c3d53371 items=0 ppid=1725 pid=57503 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1445 comm="cron" exe="/usr/sbin/cron" key=(null) Aug 22 22:12:01 compute3 audit: PROCTITLE proctitle=2F7573722F7362696E2F43524F4E002D66 Aug 22 22:12:01 compute3 CRON[57503]: pam_unix(cron:session): session opened for user root by (uid=0) The only thing that stands out to me is that there are no devices listed but in all of the openstack-ansible ceph documentation devices are never mentioned so I assume they are being detected automatically, is that right? Thank you, FV
On Aug 22, 2022, at 1:08 PM, Father Vlasie <fv@spots.edu> wrote:
Hello everyone,
I am running setup-infrastucture.yml. I have followed the ceph production example here: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.ht...
I have set things up so the compute and storage nodes are the same machine (hyperconverged). And the storage devices are devoid of any volumes or partitions.
I see the following error:
------
FAILED - RETRYING: [compute3 -> infra1_ceph-mon_container-0d679d8d]: wait for all osd to be up (1 retries left). fatal: [compute3 -> infra1_ceph-mon_container-0d679d8d(192.168.3.145)]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["ceph", "--cluster", "ceph", "osd", "stat", "-f", "json"], "delta": "0:00:00.223291", "end": "2022-08-22 19:36:29.473358", "msg": "", "rc": 0, "start": "2022-08-22 19:36:29.250067", "stderr": "", "stderr_lines": [], "stdout": "\n{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}", "stdout_lines": ["", "{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}”]}
------
I am not sure where to look to find more information. Any help would be much appreciated!
Thank you,
FV
For deploying ceph, Openstack-Ansible is just a thin wrapper around ceph-ansible (see https://docs.ceph.com/projects/ceph-ansible/en/latest/index.html). You have to define the variables that ceph-ansible requires. We have a test scenario for Openstack-Ansible + Ceph, which uses the following variables https://github.com/openstack/openstack-ansible/blob/master/tests/roles/boots.... Most of those are used in the ceph-ansible roles, not Openstack-Ansible directly. For the purposes of that test case LVM loopback devices are set up and a suitable ceph.conf is written out here https://github.com/openstack/openstack-ansible/blob/master/tests/roles/boots... If you wish to have Openstack-Ansible call the ceph-ansible roles for you to deploy ceph then you must take the time to understand ceph-ansible sufficiently to set the variables it requires to deploy correctly in your situation. Openstack-Ansible does not manage this for you. It is also possible to independently deploy ceph using whatever means you like outside of openstack-ansible, and pass a very small amount of data to provide an integration between the two. Those options are described briefly here https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.ht... and https://docs.openstack.org/openstack-ansible-ceph_client/latest/configure-ce... Jonathan. On 23/08/2022 00:53, Father Vlasie wrote:
I have done a bit more searching…the error is related to the _reporting_ on the OSDs. I tried to get some info from journalctl while the infrasrtucture playbook was running and all I could see was this:
Aug 22 22:11:31 compute3 python3[57496]: ansible-ceph_volume Invoked with cluster=ceph action=list objectstore=bluestore dmcrypt=False batch_devices=[] osds_per_device=1 journal_size=5120 journal_devices=[] block_db_size=-1 block_db_devices=[] wal_devices=[] report=False destroy=True data=None data_vg=None journal=None journal_vg=None db=None db_vg=None wal=None wal_vg=None crush_device_class=None osd_fsid=None osd_id=None Aug 22 22:12:01 compute3 audit[57503]: USER_ACCT pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_permit acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success' Aug 22 22:12:01 compute3 audit[57503]: CRED_ACQ pid=57503 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_permit,pam_cap acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success' Aug 22 22:12:01 compute3 audit[57503]: SYSCALL arch=c000003e syscall=1 success=yes exit=1 a0=7 a1=7ffe656d1100 a2=1 a3=7fe9c3d53371 items=0 ppid=1725 pid=57503 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=1445 comm="cron" exe="/usr/sbin/cron" key=(null) Aug 22 22:12:01 compute3 audit: PROCTITLE proctitle=2F7573722F7362696E2F43524F4E002D66 Aug 22 22:12:01 compute3 CRON[57503]: pam_unix(cron:session): session opened for user root by (uid=0)
The only thing that stands out to me is that there are no devices listed but in all of the openstack-ansible ceph documentation devices are never mentioned so I assume they are being detected automatically, is that right?
Thank you,
FV
On Aug 22, 2022, at 1:08 PM, Father Vlasie <fv@spots.edu> wrote:
Hello everyone,
I am running setup-infrastucture.yml. I have followed the ceph production example here: https://docs.openstack.org/openstack-ansible/latest/user/ceph/full-deploy.ht...
I have set things up so the compute and storage nodes are the same machine (hyperconverged). And the storage devices are devoid of any volumes or partitions.
I see the following error:
------
FAILED - RETRYING: [compute3 -> infra1_ceph-mon_container-0d679d8d]: wait for all osd to be up (1 retries left). fatal: [compute3 -> infra1_ceph-mon_container-0d679d8d(192.168.3.145)]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["ceph", "--cluster", "ceph", "osd", "stat", "-f", "json"], "delta": "0:00:00.223291", "end": "2022-08-22 19:36:29.473358", "msg": "", "rc": 0, "start": "2022-08-22 19:36:29.250067", "stderr": "", "stderr_lines": [], "stdout": "\n{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}", "stdout_lines": ["", "{\"epoch\":6,\"num_osds\":0,\"num_up_osds\":0,\"osd_up_since\":0,\"num_in_osds\":0,\"osd_in_since\":0,\"num_remapped_pgs\":0}”]}
------
I am not sure where to look to find more information. Any help would be much appreciated!
Thank you,
FV
participants (2)
-
Father Vlasie
-
Jonathan Rosser