[Openstack-operators] operator challenges with MySQL
Jay Pipes
jaypipes at gmail.com
Tue Jul 9 20:46:24 UTC 2013
On 07/08/2013 06:47 AM, Sushil Suresh wrote:
> Hi Matt,
>
> Welcome to the list.
Indeed, welcome to the OpenStack community, Matt. Sushil did a good job
outlining the major pain points below. I added a few things inline, but
on a general note, once an operator stops using MySQL for Keystone token
storage and separates heavy-write-few-read-pattern database traffic
(like Ceilometer) from heavy-read-few-write-pattern database traffic
(like pretty much everything else in OpenStack), I think most operators
find that the database itself is not the number one (or even number
three bottleneck in OpenStack. We've found that the message queue layer
and limitations with software-defined networking components are a much
larger scaling problem than the database layer, frankly.
Anyway, some more comments inline...
> DB Migrations.
> ----------------------
> Openstack is a very fast evolving project, which is great. However that
> means there are quite a lot of db migrations which add and remove
> columns to existing tables and perform similar operations with indexes
> etc.
>
> If you have a production environment that is heavily used like ours, you
> are looking at having millions of rows in each of these tables.
>
> As a developer writing code and testing, these migrations work perfectly
> well. Typical tests are performed with test databases containing only test
> data which never comes up to millions or records.
> Further more there is no production load on the test database actively
> trying to write stuff when you are altering the tables.
>
> Database abstraction with SQLAlchemy is great, but it generally mean
> your schema alterations end up having the standard ALTER TABLE syntax.
>
> I have personally used Percona's pt-online-schema-change.html to get me
> out of some of these sticky situations.
> http://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html
> BTW, Thank you very much for the above tool.
>
> There is a risk in using such approaches as you want to make sure that
> you are not deviating even in the slightest of manner from what would
> have been achieved using the standard alter table. There is some work
> being done to introduce archiving of data etc, which will help keep data
> growth in checks. Also using an online schema change becomes tricky when
> you have foreign key constraints on tables or if you have setup triggers
> for your table. (thankfully no triggers yet).
Total agreement with Sushil here...
> ---------------------------------------------
>
> High Availability is also something that that I think could so with some
> attention.
> http://docs.openstack.org/trunk/openstack-ha/content/ch-intro.html
> The above page does detail the current recommended approach for setting
> up highly available mysql database servers.
> The current approach is great if you are using dedicated hardware and
> have separate physical switches etc.
> It does work reasonably well in a virtualised environment too, but one
> needs to take into account several other factors.
>
> With mysql 5.6 support global transaction ID, (GTID) and improvements in
> galera and PXC(percona xtradb cluster)
> I think there is definitely room to review the current recommended solutions
I think one of the biggest areas that Percona folks could really help
with are detailed tutorials on how to appropriately split the read/write
database traffic I spoke about earlier and how to make the most
effective use of PXC/Galera. We use Galera internally for all of our
identity and image database traffic, synchronously replicate between our
deployment zones, and it's excellent. We use Galera for our other
databases internal to a deployment zone as well (with the exception of
Ceilometer, which is better utilized (IMO) with standard MySQL
master/slave setups.
I'd be happy to collaborate with you or someone from Percona in the
coming months on such an article. Feel free to email me directly if you
have interest.
In addition to the above topics, it might be good to have a couple
articles on backup and recovery best practices in relation to the
database layer... specifically around what needs to be "taken down"
during a recovery and what can stay online (for instance, a failed or
corrupted DB doesn't necessarily need to mean loss of service or
connectivity to tenant VMs...)
All the best,
-jay
More information about the OpenStack-operators
mailing list