<div class="gmail_quote">On Fri Feb 06 2015 at 12:59:13 PM Gregory Haynes <<a href="mailto:greg@greghaynes.net">greg@greghaynes.net</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Excerpts from Joshua Harlow's message of 2015-02-06 01:26:25 +0000:<br>

> Angus Lees wrote:<br>

> > On Fri Feb 06 2015 at 4:25:43 AM Clint Byrum <<a href="mailto:clint@fewbar.com" target="_blank">clint@fewbar.com</a><br>

> > <mailto:<a href="mailto:clint@fewbar.com" target="_blank">clint@fewbar.com</a>>> wrote:<br>

> >     I'd also like to see consideration given to systems that handle<br>

> >     distributed consistency in a more active manner. etcd and Zookeeper are<br>

> >     both such systems, and might serve as efficient guards for critical<br>

> >     sections without raising latency.<br>

> ><br>

> ><br>

> > +1 for moving to such systems.  Then we can have a repeat of the above<br>

> > conversation without the added complications of SQL semantics ;)<br>

> ><br>

><br>

> So just an fyi:<br>

><br>

> <a href="http://docs.openstack.org/developer/tooz/" target="_blank">http://docs.openstack.org/<u></u>developer/tooz/</a> exists.<br>

><br>

> Specifically:<br>

><br>

> <a href="http://docs.openstack.org/developer/tooz/developers.html#tooz.coordination.CoordinationDriver.get_lock" target="_blank">http://docs.openstack.org/<u></u>developer/tooz/developers.<u></u>html#tooz.coordination.<u></u>CoordinationDriver.get_lock</a><br>

><br>

> It has a locking api that it provides (that plugs into the various<br>

> backends); there is also a WIP <a href="https://review.openstack.org/#/c/151463/" target="_blank">https://review.openstack.org/#<u></u>/c/151463/</a><br>

> driver that is being worked for etc.d.<br>

><br>

<br>

An interesting note about the etcd implementation is that you can<br>

select per-request whether you want to wait for quorum on a read or not.<br>

This means that in theory you could obtain higher throughput for most<br>

operations which do not require this and then only gain quorum for<br>

operations which require it (e.g. locks).<br></blockquote><div><br></div><div>Along those lines and in an effort to be a bit less doom-and-gloom, I spent my lunch break trying to find non-marketing documentation on the Galera replication protocol and how it is exposed. (It was surprisingly difficult to find such information *)</div><div><br></div><div>It's easy to get the transaction ID of the last commit (wsrep_last_committed), but I can't find a way to wait until at least a particular transaction ID has been synced.  If we can find that latter functionality, then we can expose that sequencer all the way through (HTTP header?) and then any follow-on commands can mention the sequencer of the previous write command that they really need to see the effects of.</div><div><br></div><div>In practice, this should lead to zero additional wait time, since the Galera replication has almost certainly already caught up by the time the second command comes in - and we can just read from the local server with no additional delay.</div><div><br></div><div>See the various *Index variables in the etcd API, for how the same idea gets used there.</div><div><br></div><div> - Gus</div><div><br></div><div>(*) In case you're also curious, the only doc I found with any details was <a href="http://galeracluster.com/documentation-webpages/certificationbasedreplication.html">http://galeracluster.com/documentation-webpages/certificationbasedreplication.html</a> and its sibling pages.</div></div>