[openstack-dev] [Cinder] A possible solution for HA Active-Active

Duncan Thomas duncan.thomas at gmail.com
Tue Aug 4 07:32:40 UTC 2015


On 3 August 2015 at 20:53, Clint Byrum <clint at fewbar.com> wrote:

> Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21
> -0700:
> Also on a side note, I think Cinder's need for this is really subtle,
> and one could just accept that sometimes it's going to break when it does
> two things to one resource from two hosts. The error rate there might
> even be lower than the false-error rate that would be caused by a twitchy
> DLM with timeouts a little low. So there's a core cinder discussion that
> keeps losing to the shiny DLM discussion, and I'd like to see it played
> out fully: Could Cinder just not do anything, and let the few drivers
> that react _really_ badly, implement their own concurrency controls?
>


So the problem here is data corruption. Lots of our races can cause data
corruption. Not 'my instance didn't come up', not 'my network is screwed
and I need to tear everything down and do it again', but 'My 1tb of
customer database is now missing the second half'. This means that we
*really* need some confidence and understanding in whatever we do. The idea
of locks timing out and being stolen without fencing is frankly scary and
begging for data corruption unless we're very careful. I'd rather use a
persistent lock (e.g. a db record change) and manual recovery than a lock
timeout that might cause corruption.


-- 
Duncan Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150804/eb8ef5d4/attachment.html>


More information about the OpenStack-dev mailing list