[openstack-dev] [Fuel][MySQL][DLM][Oslo][DB][Trove][Galera][operators] Multi-master writes look OK, OCF RA and more things

Mike Bayer mbayer at redhat.com
Sat Apr 30 20:14:05 UTC 2016

On 04/30/2016 10:50 AM, Clint Byrum wrote:
> Excerpts from Roman Podoliaka's message of 2016-04-29 12:04:49 -0700:
> I'm curious why you think setting wsrep_sync_wait=1 wouldn't help.
> The exact example appears in the Galera documentation:
> http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-sync-wait
> The moment you say 'SET SESSION wsrep_sync_wait=1', the behavior should
> prevent the list problem you see, and it should not matter that it is
> a separate session, as that is the entire point of the variable:

we prefer to keep it off and just point applications at a single node 
using master/passive/passive in HAProxy, so that we don't have the 
unnecessary performance hit of waiting for all transactions to 
propagate; we just stick on one node at a time.   We've fixed a lot of 
issues in our config in ensuring that HAProxy definitely keeps all 
clients on exactly one Galera node at a time.

> "When you enable this parameter, the node triggers causality checks in
> response to certain types of queries. During the check, the node blocks
> new queries while the database server catches up with all updates made
> in the cluster to the point where the check was begun. Once it reaches
> this point, the node executes the original query."
> In the active/passive case where you never use the passive node as a
> read slave, one could actually set wsrep_sync_wait=1 globally. This will
> cause a ton of lag while new queries happen on the new active and old
> transactions are still being applied, but that's exactly what you want,
> so that when you fail over, nothing proceeds until all writes from the
> original active node are applied and available on the new active node.
> It would help if your failover technology actually _breaks_ connections
> to a presumed dead node, so writes stop happening on the old one.

If HAProxy is failing over from the master, which is no longer 
reachable, to another passive node, which is reachable, that means that 
master is partitioned and will leave the Galera primary component.   It 
also means all current database connections are going to be bounced off, 
which will cause errors for those clients either in the middle of an 
operation, or if a pooled connection is reused before it is known that 
the connection has been reset.  So failover is usually not an error-free 
situation in any case from a database client perspective and retry 
schemes are always going to be needed.

Additionally, the purpose of the enginefacade [1] is to allow Openstack 
applications to fix their often incorrectly written database access 
logic such that in many (most?) cases, a single logical operation is no 
longer unnecessarily split among multiple transactions when possible. 
I know that this is not always feasible in the case where multiple web 
requests are coordinating, however.

That leaves only the very infrequent scenario of, the master has 
finished sending a write set off, the passives haven't finished 
committing that write set, the master goes down and HAProxy fails over 
to one of the passives, and the application that just happens to also be 
connecting fresh onto that new passive node in order to perform the next 
operation that relies upon the previously committed data so it does not 
see a database error, and instead runs straight onto the node where the 
committed data it's expecting hasn't arrived yet.   I can't make the 
judgment for all applications if this scenario can't be handled like any 
other transient error that occurs during a failover situation, however 
if there is such a case, then IMO the wsrep_sync_wait (formerly known as 
wsrep_causal_reads) may be used on a per-transaction basis for that very 
critical, not-retryable-even-during-failover operation.  Allowing this 
variable to be set for the scope of a transaction and reset afterwards, 
and only when talking to Galera, is something we've planned to work into 
the enginefacade as well as an declarative transaction attribute that 
would be a pass-through on other systems.


> Also, If you thrash back and forth a bit, that could cause your app to
> virtually freeze, but HAProxy and most other failover technologies allow
> tuning timings so that you can stay off of a passive server long enough
> to calm it down and fail more gracefully to it.
> Anyway, this is why sometimes I do wonder if we'd be better off just
> using MySQL with DRBD and good old pacemaker.
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list