[Openstack-operators] [cinder] Thoughts on cinder readiness

Arne Wiebalck Arne.Wiebalck at cern.ch
Thu Jun 1 08:48:00 UTC 2017


Joshua,

We’ve introduced Cinder on Ceph in production more than 3 years ago (when we
were still on Havana, IIRC). Today, we have around 4’000 volumes with a total size
of 1.25PB (half of which is actually filled).

To back up what Mike and Erik already said, Cinder has given us very little
problems during that time. Three things to maybe mention are:

- we were running with multiple volume servers using the same ‘host’ identifier
for some time; this may have led to some volumes stuck or DB inconsistencies
we’ve encountered; the config has meanwhile been changed to have only one 
active c-vol (and we’re of course closely following the ongoing A/A HA work); 

- until recently, we’ve not explicitly specified the pymysql driver in the DB connection
string; for quite some time this led to RPC timeouts and volumes stuck in deletion
when launching parallel deletions for 20+ volumes in one go; the config we’ve have
been carrying forward since the initial setup has now been corrected and we are not
able to reproduce the problem any longer;

- depending on your actual setup, Cinder upgrades will require a service downtime;
you may want to check the docs for the recent work on rolling upgrades to see how
you’ll need to set up things in order to minimise the intervention time (if that is important
for your use case).

Cheers,
 Arne


> On 01 Jun 2017, at 06:06, Joshua Harlow <harlowja at fastmail.com> wrote:
> 
> Erik McCormick wrote:
>> I've been running Ceph-backed Cinder since, I think, Icehouse. It's
>> really more of a function of your backend or the hypervisor than Cinder
>> itself. That being said, it's been probabky mt smallest Openstack pain
>> point iver the years.
>> 
>> I can't imagine what sort of concurrency issues you'd run into short of
>> a large public cloud given that it really doesn't do much once
>> provisioning a volume is complete. Maybe if you've got people taking a
>> ton of snapshots? What sort of specific issues are you concerned about?
>> 
> 
> Mainly the ones that spawned articles/specs like:
> 
> https://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/
> 
> https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cinder-volume-active-active-support.html
> 
> And a few more like those, I'm especially not going to be a big fan of having to (as a person, myself or others on the godaddy team) go in and muck with volumes in stuck states and so-on (similar issues occur in nova, which just drain the blood out of humans that have to go fix them).
> 
>> -Erik
>> 
>> On May 31, 2017 8:30 PM, "Mike Lowe" <jomlowe at iu.edu
>> <mailto:jomlowe at iu.edu>> wrote:
>> 
>>    We have run ceph backed cinder from Liberty through Newton, with the
>>    exception of a libvirt 2.x bug that should now be fixed, cinder
>>    really hasn't caused us any problems.
>> 
>>    Sent from my iPad
>> 
>>     > On May 31, 2017, at 6:12 PM, Joshua Harlow <harlowja at fastmail.com
>>    <mailto:harlowja at fastmail.com>> wrote:
>>     >
>>     > Hi folks,
>>     >
>>     > So I was having some back and forth internally about is cinder
>>    ready for usage and wanted to get other operators thoughts on how
>>    there cinder experiences have been going, any trials and tribulations.
>>     >
>>     > For context, we are running on liberty (yes I know, working on
>>    getting that to newer versions) and folks in godaddy are starting to
>>    use more and more cinder (backed by ceph) and that got me thinking
>>    about asking the question from operators (and devs) on what kind of
>>    readiness 'rating' (or whatever you would want to call it) would
>>    people give cinder in liberty.
>>     >
>>     > Some things that I was thinking was around concurrency rates,
>>    because I know that's be a common issue that the cinder developers
>>    have been working through (using tooz, and various other lock
>>    mechanisms and such).
>>     >
>>     > Have other cinder operators seen concurrent operations (or
>>    conflicting operations or ...) work better in newer releases (is
>>    there any metric/s anyone has gathered about how things have gotten
>>    worse/better under scale for cinder in various releases? partically
>>    with regard to using ceph).
>>     >
>>     > Thoughts?
>>     >
>>     > It'd be interesting to capture (not just for my own usage) I
>>    think because such info helps the overall user and operator and dev
>>    community (and yes I would expect various etherpads to have parts of
>>    this information, but it'd be nice to have like a single place where
>>    other operators can specify how ready they believe a project is for
>>    a given release and for a given configuration; and ideally provide
>>    details/comments as to why they believe this).
>>     >
>>     > -Josh
>>     >
>>     >
>>     >
>>     >
>>     > _______________________________________________
>>     > OpenStack-operators mailing list
>>     > OpenStack-operators at lists.openstack.org
>>    <mailto:OpenStack-operators at lists.openstack.org>
>>     >
>>    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>> 
>>    _______________________________________________
>>    OpenStack-operators mailing list
>>    OpenStack-operators at lists.openstack.org
>>    <mailto:OpenStack-operators at lists.openstack.org>
>>    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>> 
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

--
Arne Wiebalck
CERN IT



More information about the OpenStack-operators mailing list