[openstack-dev] [oslo.db]A proposal for DB read/write separation
Li Ma
skywalker.nick at gmail.com
Sun Aug 10 15:17:46 UTC 2014
Thanks for all the detailed analysis, Mike W, Mike B, and Roman.
For a production-ready database system, replication is a must I think. So, the questions are which replication mode is suitable for OpenStack and which way is suitable for OpenStack to improve performance and scalability of DB access.
In current implementation of database API in OpenStack, master/slave connection is defined for optimizing the performance. Developers of each OpenStack component take the responsibility of making use of it in the application context and some other guys take the responsibility of architecting database system to meet the requirements in various production environments. No general guideline for it. Actually, it is not that easy to determine which transaction is able to be conducted by slave due to data consistency and business logic for different OpenStack components.
The current status is that master/slave configuration is not widely used and only Nova uses slave connection in its periodic tasks which are not sensitive to the status of replication. Due to the nature of asynchronous replication, query to DB is not stable, so the risks of using slaves are apparent.
How about Galera multi-master cluster? As Mike Bayer said, it is virtually synchronous by default. It is still possible that outdated rows are queried that make results not stable.
When using such eventual consistency methods, you have to carefully design which transaction is tolerant of old data. AFAIK, no matter which component is, Nova, Cinder or Neutron, most of the transactions are not that 'tolerant'. As Mike Bayer said, consistent relational dataset is very important. As a footnote, consistent relational dataset is very important for OpenStack components. This is why only non-sensitive periodic tasks are using slaves in Nova.
Let's move forward to synchronous replication, like Galera with causal-reads on. The dominant advantage is that it has consistent relational dataset support. The disadvantage are that it uses optimistic locking and its performance sucks (also said by Mike Bayer :-). For optimistic locking problem, I think it can be dealt with by retry-on-deadlock. It's not the topic here.
If we first ignore the performance-suck problem, multi-master cluster with synchronous replication is the perfect for OpenStack with any masters+slaves enabled and it can truly scale-out.
So, the transparent read/write separation is dependent on such an environment. SQLalchemy tutorial provides code sample for it [1]. Besides, Mike Bayer also provides a blog post for it [2].
What I did is to re-implement it in OpenStack DB API modules in my development environment, using Galera cluster(causal-reads on). It has been running perfectly for more than a week. The routing session manager works well while maintaining data consistency.
Back to the performance-suck problem, theoretically causal-reads-on will definitely affect the overall performance of concurrent DB reads, but I cannot find any report(officially or unofficially) on causal-reads-performance-degradation. Actually in the production system of my company, the Galera performance is tuned via network round-trip time, network throughput, number of slave threads, keep-alive and wsrep flow control parameters.
All in all, firstly, transparent read/write separation is feasible using synchronous replication method. Secondly, it may help scale-out in large deployment without any code modification. Moreover, it needs fine-tuning (Of course, every production system needs it :-). Finally, I think if we can integrate it into oslo.db, it is a perfect plus for those who would like to deploy Galera (or other similar technology) as DB backend.
[1] http://docs.sqlalchemy.org/en/rel_0_9/orm/session.html#custom-vertical-partitioning
[2] http://techspot.zzzeek.org/2012/01/11/django-style-database-routers-in-sqlalchemy/
[3] Galera replication method: http://galeracluster.com/products/technology/
More information about the OpenStack-dev
mailing list