[openstack-dev] [Fuel][MySQL][DLM][Oslo][DB][Trove][Galera][operators] Multi-master writes look OK, OCF RA and more things
mbayer at redhat.com
Sat Apr 30 20:14:05 UTC 2016
On 04/30/2016 10:50 AM, Clint Byrum wrote:
> Excerpts from Roman Podoliaka's message of 2016-04-29 12:04:49 -0700:
> I'm curious why you think setting wsrep_sync_wait=1 wouldn't help.
> The exact example appears in the Galera documentation:
> The moment you say 'SET SESSION wsrep_sync_wait=1', the behavior should
> prevent the list problem you see, and it should not matter that it is
> a separate session, as that is the entire point of the variable:
we prefer to keep it off and just point applications at a single node
using master/passive/passive in HAProxy, so that we don't have the
unnecessary performance hit of waiting for all transactions to
propagate; we just stick on one node at a time. We've fixed a lot of
issues in our config in ensuring that HAProxy definitely keeps all
clients on exactly one Galera node at a time.
> "When you enable this parameter, the node triggers causality checks in
> response to certain types of queries. During the check, the node blocks
> new queries while the database server catches up with all updates made
> in the cluster to the point where the check was begun. Once it reaches
> this point, the node executes the original query."
> In the active/passive case where you never use the passive node as a
> read slave, one could actually set wsrep_sync_wait=1 globally. This will
> cause a ton of lag while new queries happen on the new active and old
> transactions are still being applied, but that's exactly what you want,
> so that when you fail over, nothing proceeds until all writes from the
> original active node are applied and available on the new active node.
> It would help if your failover technology actually _breaks_ connections
> to a presumed dead node, so writes stop happening on the old one.
If HAProxy is failing over from the master, which is no longer
reachable, to another passive node, which is reachable, that means that
master is partitioned and will leave the Galera primary component. It
also means all current database connections are going to be bounced off,
which will cause errors for those clients either in the middle of an
operation, or if a pooled connection is reused before it is known that
the connection has been reset. So failover is usually not an error-free
situation in any case from a database client perspective and retry
schemes are always going to be needed.
Additionally, the purpose of the enginefacade  is to allow Openstack
applications to fix their often incorrectly written database access
logic such that in many (most?) cases, a single logical operation is no
longer unnecessarily split among multiple transactions when possible.
I know that this is not always feasible in the case where multiple web
requests are coordinating, however.
That leaves only the very infrequent scenario of, the master has
finished sending a write set off, the passives haven't finished
committing that write set, the master goes down and HAProxy fails over
to one of the passives, and the application that just happens to also be
connecting fresh onto that new passive node in order to perform the next
operation that relies upon the previously committed data so it does not
see a database error, and instead runs straight onto the node where the
committed data it's expecting hasn't arrived yet. I can't make the
judgment for all applications if this scenario can't be handled like any
other transient error that occurs during a failover situation, however
if there is such a case, then IMO the wsrep_sync_wait (formerly known as
wsrep_causal_reads) may be used on a per-transaction basis for that very
critical, not-retryable-even-during-failover operation. Allowing this
variable to be set for the scope of a transaction and reset afterwards,
and only when talking to Galera, is something we've planned to work into
the enginefacade as well as an declarative transaction attribute that
would be a pass-through on other systems.
> Also, If you thrash back and forth a bit, that could cause your app to
> virtually freeze, but HAProxy and most other failover technologies allow
> tuning timings so that you can stay off of a passive server long enough
> to calm it down and fail more gracefully to it.
> Anyway, this is why sometimes I do wonder if we'd be better off just
> using MySQL with DRBD and good old pacemaker.
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev