<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Mike,</div><div class=""><br class=""></div><div class="">For a sufficiently large number of volumes, the thin provisioning stats gathering could break things already</div><div class="">before the referenced patch:</div><div class=""><br class=""></div><div class=""><a href="https://bugs.launchpad.net/cinder/+bug/1704106" class="">https://bugs.launchpad.net/cinder/+bug/1704106</a></div><div class=""><br class=""></div><div class="">It seems, however, that the attempt to gather at least the correct data (used instead of allocated) lowers that</div><div class="">threshold even further.</div><div class=""><br class=""></div><div class="">In order to allow our c-vol to start (and as we don’t use over-provisioning), we’ve for now commented out the</div><div class="">usage stats gathering.  </div><div class=""><br class=""></div><div class="">Cheers,</div><div class=""> Arne</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><br class=""><div><blockquote type="cite" class=""><div class="">On 03 Aug 2017, at 20:47, Mike Lowe <<a href="mailto:jomlowe@iu.edu" class="">jomlowe@iu.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">I did the minor point release update from 10.0.2 to 10.0.4 and found my cinder volume services would go out to lunch during startup. They would do their initial heartbeat then get marked as dead never sending another heartbeat.  The process was running and there were constant logs about ceph connections but what was missing was the follow up to "Initializing RPC dependent components of volume driver RBDDriver (1.2.0)”. It never finished the rpc init "Driver post RPC initialization completed successfully.”  Digging in a little bit with my limited knowledge of the python librbd it seems that this commit landed in 10.0.4 <a href="https://github.com/openstack/cinder/commit/e72dead5ce085a6ba66f7aad2ff58061842f43d2" class="">https://github.com/openstack/cinder/commit/e72dead5ce085a6ba66f7aad2ff58061842f43d2</a>  Instead of looping over the volume size for every volume it looped over all the volumes calling diff_iterate from offset 0 to the end.   Near as I can tell this actually calls whatever you pass in as iterate_cb for every used extent of the volume. So a handful of empty volumes no problem, but in production by my count I would have to call iterate_cb 12.6M times just to add up the bytes used from each extent.   I’ve filed a bug <a href="https://bugs.launchpad.net/cinder/+bug/1708507" class="">https://bugs.launchpad.net/cinder/+bug/1708507</a> and downgrading to 10.0.2 seems to be an ok workaround.<br class=""><br class="">TLDR; if you have ceph don’t upgrade past 10.0.2, for the time being<br class="">_______________________________________________<br class="">OpenStack-operators mailing list<br class=""><a href="mailto:OpenStack-operators@lists.openstack.org" class="">OpenStack-operators@lists.openstack.org</a><br class="">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators<br class=""></div></div></blockquote></div><br class=""></body></html>