[openstack-dev] [Keystone] Cockroachdb for Keystone Multi-master

Jay Pipes jaypipes at gmail.com
Wed May 31 01:06:59 UTC 2017

On 05/30/2017 05:07 PM, Clint Byrum wrote:
> Excerpts from Jay Pipes's message of 2017-05-30 14:52:01 -0400:
>> Sorry for the delay in getting back on this... comments inline.
>> On 05/18/2017 06:13 PM, Adrian Turjak wrote:
>>> Hello fellow OpenStackers,
>>> For the last while I've been looking at options for multi-region
>>> multi-master Keystone, as well as multi-master for other services I've
>>> been developing and one thing that always came up was there aren't many
>>> truly good options for a true multi-master backend.
>> Not sure whether you've looked into Galera? We had a geo-distributed
>> 12-site Galera cluster servicing our Keystone assignment/identity
>> information WAN-replicated. Worked a charm for us at AT&T. Much easier
>> to administer than master-slave replication topologies and the
>> performance (yes, even over WAN links) of the ws-rep replication was
>> excellent. And yes, I'm aware Galera doesn't have complete snapshot
>> isolation support, but for Keystone's workloads (heavy, heavy read, very
>> little write) it is indeed ideal.
> This has not been my experience.
> We had a 3 site, 9 node global cluster and it was _extremely_ sensitive
> to latency. We'd lose even read ability whenever we had a latency storm
> due to quorum problems.
> Our sites were London, Dallas, and Sydney, so it was pretty common for
> there to be latency between any of them.
> I lost track of it after some reorgs, but I believe the solution was
> to just have a single site 3-node galera for writes, and then use async
> replication for reads. We even helped land patches in Keystone to allow
> split read/write host configuration.

Interesting, thanks for the info. Can I ask, were you using the Galera 
cluster for read-heavy data like Keystone identity/assignment storage? 
Or did you have write-heavy data mixed in (like Keystone's old UUID 
token storage...)

It should be noted that CockroachDB's documentation specifically calls 
out that it is extremely sensitive to latency due to the way it measures 
clock skew... so might not be suitable for WAN-separated clusters?


More information about the OpenStack-dev mailing list