<div dir="ltr">Greetings,<div><br></div><div>While working on upgrade of OpenStack with Fuel installer, I meet a requirement to re-add OSD devices with the existing data set to a Ceph cluster using Puppet module. Node is reinstalled during the upgrade, thus disks used for OSDs are not mounted at Puppet runtime.</div><div><br></div><div>Current version of Ceph module in fuel-library only supports addition of new OSD devices. Mounted devices are skipped. Not mounted devices with Ceph UUID in GPT label are passed to 'ceph-deploy osd prepare' command that formats the device, recreates file system and all existing data is lost.</div><div><br></div><div>I proposed a patch to allow support for OSD devices with existing data set:</div><div><a href="https://review.openstack.org/#/c/203639/2" target="_blank">https://review.openstack.org/#/c/203639/2</a><br></div><div><br></div><div>However, this fix is very straightforward and doesn't account for different corner cases, as was pointed out by Mykola Golub in review. As this problem seems rather significant to me, I'd like to bring this discussion to the broader audience.</div><div><br></div><div>So, here's the comment with my replies inline:</div><div><br></div><div><p style="color:rgb(0,0,0);font-family:sans-serif">I am not sure just reactivating disks that have a filesystem is a safe approach:</p></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><p style="color:rgb(0,0,0);font-family:sans-serif">1) If you are deploying a mix of new and restored disks you may end up with confiicting OSDs joining the cluster with the same ID. 2) It makes sense to restore OSDs only if a monitor (cluster) is restored, otherwise activation of old OSDs will fail. 3) It might happen that the partition contains a valid filesystem by accident (e.g. the user reused disk/hosts from another cluster) -- it will not join the cluster because wrong fsid and credentials but the deployment will unexpectedly fail.</p></div></blockquote><font color="#000000" face="sans-serif">1) As far as I can tell, OSD device IDs are assgined by Ceph cluster based on already existing devices. So, if some ID is stored on the device, either device with the given ID already exists in the cluster and no other new device will the same ID, or cluster doesn't know about a device with the given ID, and that means we already lost the data placement before.</font><div><font color="#000000" face="sans-serif">2) This can be fixed by adding a check that ensures that fsid parameter in ceph.conf on the node and cluster-fsid on the device are equal. Otherwise the device is treated like a new device, i.e. passed to 'ceph-deploy osd prepare'.</font></div><div><font color="#000000" face="sans-serif">3) This situation would be covered by previous check, in my understanding.<br></font><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><p style="color:rgb(0,0,0);font-family:sans-serif">Is it posible to pass information that the cluster is restored using partition preservation? Becasue I think a much safer approach is:</p></div><div><p style="color:rgb(0,0,0);font-family:sans-serif">1) Pass some flag from the user that we are restoring the cluster 2) Restore controller (monitor) and abort deployment if it fails. 3) When deploying osd host, if 'restore' flag is present, skip prepare step and try only activate for all disks if possible (we might want to ignore activate error, and continue with other disks so we restore osds as many as possible)</p></div></blockquote><font color="#000000" face="sans-serif">The case I want to support by this change is not restoration of the whole cluster, but rather support for reinstallation of OSD node's operating system. For this case, the approach you propose seems actually more correct than my implementation. For node being reinstalled we do not expect new devices, but only ones with the existing data set, so we don't need to specifically check for it, but rather just skip prepare for all devices.</font></div><div><font color="#000000" face="sans-serif"><br></font></div><div><font color="#000000" face="sans-serif">We still need to check that the value of fsid on the disk is consistent with the cluster's fsid.</font></div><div><font color="#000000" face="sans-serif"><br></font></div><div><font color="#000000" face="sans-serif">Which issues should we anticipate with this kind of approach?</font></div><div><font color="#000000" face="sans-serif"><br></font></div><div>Another question that is still unclear to me is if someone really needs support for a hybrid use case when the new and existing unmounted OSD devices are mixed in one OSD node?</div><div><br></div><div>--</div><div>Best regards,</div><div>Oleg Gelbukh</div></div>