[Openstack-operators] operator challenges with MySQL

Jay Pipes jaypipes at gmail.com
Tue Jul 9 20:46:24 UTC 2013

On 07/08/2013 06:47 AM, Sushil Suresh wrote:
> Hi Matt,
> Welcome to the list.

Indeed, welcome to the OpenStack community, Matt. Sushil did a good job 
outlining the major pain points below. I added a few things inline, but 
on a general note, once an operator stops using MySQL for Keystone token 
storage and separates heavy-write-few-read-pattern database traffic 
(like Ceilometer) from heavy-read-few-write-pattern database traffic 
(like pretty much everything else in OpenStack), I think most operators 
find that the database itself is not the number one (or even number 
three bottleneck in OpenStack. We've found that the message queue layer 
and limitations with software-defined networking components are a much 
larger scaling problem than the database layer, frankly.

Anyway, some more comments inline...

> DB Migrations.
> ----------------------
> Openstack is a very fast evolving project, which is great. However that
> means there are quite a lot of db migrations which add and remove
>   columns to existing tables and perform similar operations with indexes
> etc.
> If you have a production environment that is heavily used like ours, you
> are looking at having millions of rows in each of these tables.
> As a developer writing code and testing, these migrations work perfectly
> well. Typical tests are performed with test databases containing only test
> data which never comes up to millions or records.
> Further more there is no production load on the test database actively
> trying to write stuff when you are altering the tables.
> Database abstraction with SQLAlchemy is great, but it generally mean
> your schema alterations end up having the standard ALTER TABLE syntax.
> I have personally used Percona's pt-online-schema-change.html to get me
> out of some of these sticky situations.
> http://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html
> BTW, Thank you very much for the above tool.
> There is a risk in using such approaches as you want to make sure that
> you are not deviating even in the slightest of manner from what would
> have been achieved using the standard alter table. There is some work
> being done to introduce archiving of data etc, which will help keep data
> growth in checks. Also using an online schema change becomes tricky when
> you have foreign key constraints on tables or if you have setup triggers
> for your table. (thankfully no triggers yet).

Total agreement with Sushil here...

> ---------------------------------------------
> High Availability is also something that that I think could so with some
> attention.
> http://docs.openstack.org/trunk/openstack-ha/content/ch-intro.html
> The above page does detail the current recommended approach for setting
> up highly available mysql database servers.
> The current approach is great if you are using dedicated hardware and
> have separate physical switches etc.
> It does work reasonably well in a virtualised environment too, but one
> needs to take into account several other factors.
> With mysql 5.6 support global transaction ID, (GTID) and improvements in
> galera and PXC(percona xtradb cluster)
> I think there is definitely room to review the current recommended solutions

I think one of the biggest areas that Percona folks could really help 
with are detailed tutorials on how to appropriately split the read/write 
database traffic I spoke about earlier and how to make the most 
effective use of PXC/Galera. We use Galera internally for all of our 
identity and image database traffic, synchronously replicate between our 
deployment zones, and it's excellent. We use Galera for our other 
databases internal to a deployment zone as well (with the exception of 
Ceilometer, which is better utilized (IMO) with standard MySQL 
master/slave setups.

I'd be happy to collaborate with you or someone from Percona in the 
coming months on such an article. Feel free to email me directly if you 
have interest.

In addition to the above topics, it might be good to have a couple 
articles on backup and recovery best practices in relation to the 
database layer... specifically around what needs to be "taken down" 
during a recovery and what can stay online (for instance, a failed or 
corrupted DB doesn't necessarily need to mean loss of service or 
connectivity to tenant VMs...)

All the best,

More information about the OpenStack-operators mailing list