[openstack-dev] [Fuel][MySQL][DLM][Oslo][DB][Trove][Galera][operators] Multi-master writes look OK, OCF RA and more things
clint at fewbar.com
Sat Apr 30 15:50:30 UTC 2016
Excerpts from Roman Podoliaka's message of 2016-04-29 12:04:49 -0700:
> Hi Bogdan,
> Thank you for sharing this! I'll need to familiarize myself with this
> Jepsen thing, but overall it looks interesting.
> As it turns out, we already run Galera in multi-writer mode in Fuel
> unintentionally in the case, when the active MySQL node goes down,
> HAProxy starts opening connections to a backup, then the active goes
> up again, HAProxy starts opening connections to the original MySQL
> node, but OpenStack services may still have connections opened to the
> backup in their connection pools - so now you may have connections to
> multiple MySQL nodes at the same time, exactly what you wanted to
> avoid by using active/backup in the HAProxy configuration.
> ^ this actually leads to an interesting issue , when the DB state
> committed on one node is not immediately available on another one.
> Replication lag can be controlled via session variables , but that
> does not always help: e.g. in  Nova first goes to Neutron to create
> a new floating IP, gets 201 (and Neutron actually *commits* the DB
> transaction) and then makes another REST API request to get a list of
> floating IPs by address - the latter can be served by another
> neutron-server, connected to another Galera node, which does not have
> the latest state applied yet due to 'slave lag' - it can happen that
> the list will be empty. Unfortunately, 'wsrep_sync_wait' can't help
> here, as it's two different REST API requests, potentially served by
> two different neutron-server instances.
I'm curious why you think setting wsrep_sync_wait=1 wouldn't help.
The exact example appears in the Galera documentation:
The moment you say 'SET SESSION wsrep_sync_wait=1', the behavior should
prevent the list problem you see, and it should not matter that it is
a separate session, as that is the entire point of the variable:
"When you enable this parameter, the node triggers causality checks in
response to certain types of queries. During the check, the node blocks
new queries while the database server catches up with all updates made
in the cluster to the point where the check was begun. Once it reaches
this point, the node executes the original query."
In the active/passive case where you never use the passive node as a
read slave, one could actually set wsrep_sync_wait=1 globally. This will
cause a ton of lag while new queries happen on the new active and old
transactions are still being applied, but that's exactly what you want,
so that when you fail over, nothing proceeds until all writes from the
original active node are applied and available on the new active node.
It would help if your failover technology actually _breaks_ connections
to a presumed dead node, so writes stop happening on the old one.
Also, If you thrash back and forth a bit, that could cause your app to
virtually freeze, but HAProxy and most other failover technologies allow
tuning timings so that you can stay off of a passive server long enough
to calm it down and fail more gracefully to it.
Anyway, this is why sometimes I do wonder if we'd be better off just
using MySQL with DRBD and good old pacemaker.
More information about the OpenStack-dev