[Openstack-operators] Cinder 10.0.4 (latest Ocata) broken for ceph/rbd

Mike Lowe jomlowe at iu.edu
Thu Aug 3 18:47:19 UTC 2017


I did the minor point release update from 10.0.2 to 10.0.4 and found my cinder volume services would go out to lunch during startup. They would do their initial heartbeat then get marked as dead never sending another heartbeat.  The process was running and there were constant logs about ceph connections but what was missing was the follow up to "Initializing RPC dependent components of volume driver RBDDriver (1.2.0)”. It never finished the rpc init "Driver post RPC initialization completed successfully.”  Digging in a little bit with my limited knowledge of the python librbd it seems that this commit landed in 10.0.4 https://github.com/openstack/cinder/commit/e72dead5ce085a6ba66f7aad2ff58061842f43d2  Instead of looping over the volume size for every volume it looped over all the volumes calling diff_iterate from offset 0 to the end.   Near as I can tell this actually calls whatever you pass in as iterate_cb for every used extent of the volume. So a handful of empty volumes no problem, but in production by my count I would have to call iterate_cb 12.6M times just to add up the bytes used from each extent.   I’ve filed a bug https://bugs.launchpad.net/cinder/+bug/1708507 and downgrading to 10.0.2 seems to be an ok workaround.

TLDR; if you have ceph don’t upgrade past 10.0.2, for the time being


More information about the OpenStack-operators mailing list