Open Stack

Fri Apr 29 19:04:49 UTC 2016

Hi Bogdan,

Thank you for sharing this! I'll need to familiarize myself with this
Jepsen thing, but overall it looks interesting.

As it turns out, we already run Galera in multi-writer mode in Fuel
unintentionally in the case, when the active MySQL node goes down,
HAProxy starts opening connections to a backup, then the active goes
up again, HAProxy starts opening connections to the original MySQL
node, but OpenStack services may still have connections opened to the
backup in their connection pools - so now you may have connections to
multiple MySQL nodes at the same time, exactly what you wanted to
avoid by using active/backup in the HAProxy configuration.

^ this actually leads to an interesting issue [1], when the DB state
committed on one node is not immediately available on another one.
Replication lag can be controlled  via session variables [2], but that
does not always help: e.g. in [1] Nova first goes to Neutron to create
a new floating IP, gets 201 (and Neutron actually *commits* the DB
transaction) and then makes another REST API request to get a list of
floating IPs by address - the latter can be served by another
neutron-server, connected to another Galera node, which does not have
the latest state applied yet due to 'slave lag' - it can happen that
the list will be empty. Unfortunately, 'wsrep_sync_wait' can't help
here, as it's two different REST API requests, potentially served by
two different neutron-server instances.

Basically, you'd need to *always* wait for the latest state to be
applied before executing any queries, which Galera is trying to avoid
for performance reasons.

Thanks,
Roman

[1] https://bugs.launchpad.net/fuel/+bug/1529937
[2] http://galeracluster.com/2015/06/achieving-read-after-write-semantics-with-galera/

On Fri, Apr 22, 2016 at 10:42 AM, Bogdan Dobrelya
<bdobrelia at mirantis.com> wrote:
> [crossposting to openstack-operators at lists.openstack.org]
>
> Hello.
> I wrote this paper [0] to demonstrate an approach how we can leverage a
> Jepsen framework for QA/CI/CD pipeline for OpenStack projects like Oslo
> (DB) or Trove, Tooz DLM and perhaps for any integration projects which
> rely on distributed systems. Although all tests are yet to be finished,
> results are quite visible, so I better off share early for a review,
> discussion and comments.
>
> I have similar tests done for the RabbitMQ OCF RA clusterers as well,
> although have yet wrote a report.
>
> PS. I'm sorry for so many tags I placed in the topic header, should I've
> used just "all" :) ? Have a nice weekends and take care!
>
> [0] https://goo.gl/VHyIIE
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [Fuel][MySQL][DLM][Oslo][DB][Trove][Galera][operators] Multi-master writes look OK, OCF RA and more things

OpenStack

Community

Documentation

Branding & Legal