[openstack-dev] [Oslo] [Ironic] DB migration woes

Mike Bayer mbayer at redhat.com
Sat Jun 7 19:27:50 UTC 2014


On Jun 6, 2014, at 8:12 PM, Devananda van der Veen <devananda.vdv at gmail.com> wrote:

> I think some things are broken in the oslo-incubator db migration code.
> 
> Ironic moved to this when Juno opened and things seemed fine, until recently when Lucas tried to add a DB migration and noticed that it didn't run... So I looked into it a bit today. Below are my findings.
> 
> Firstly, I filed this bug and proposed a fix, because I think that tests that don't run any code should not report that they passed -- they should report that they were skipped.
>   https://bugs.launchpad.net/oslo/+bug/1327397
>   "No notice given when db migrations are not run due to missing engine"
> 
> Then, I edited the test_migrations.conf file appropriately for my local mysql service, ran the tests again, and verified that migration tests ran -- and they passed. Great!
> 
> Now, a little background... Ironic's TestMigrations class inherits from oslo's BaseMigrationTestCase, then "opportunistically" checks each back-end, if it's available. This opportunistic checking was inherited from Nova so that tests could pass on developer workstations where not all backends are present (eg, I have mysql installed, but not postgres), and still transparently run on all backends in the gate. I couldn't find such opportunistic testing in the oslo db migration test code, unfortunately - but maybe it's well hidden.
> 
> Anyhow. When I stopped the local mysql service (leaving the configuration unchanged), I expected the tests to be skipped, but instead I got two surprise failures:
> - test_mysql_opportunistically() failed because setUp() raises an exception before the test code could call calling _have_mysql()
> - test_mysql_connect_fail() actually failed! Again, because setUp() raises an exception before running the test itself
> 
> Unfortunately, there's one more problem... when I run the tests in parallel, they fail randomly because sometimes two test threads run different migration tests, and the setUp() for one thread (remember, it calls _reset_databases) blows up the other test.
> 
> Out of 10 runs, it failed three times, each with different errors:
>   NoSuchTableError: `chassis`
>   ERROR 1007 (HY000) at line 1: Can't create database 'test_migrations'; database exists
>   ProgrammingError: (ProgrammingError) (1146, "Table 'test_migrations.alembic_version' doesn't exist")
> 
> As far as I can tell, this is all coming from:
>   https://github.com/openstack/oslo-incubator/blob/master/openstack/common/db/sqlalchemy/test_migrations.py#L86;L111

Hello -

Just an introduction, I’m Mike Bayer, the creator of SQLAlchemy and Alembic migrations.     I’ve just joined on as a full time Openstack contributor, and trying to help improve processes such as these is my primary responsibility.

I’ve had several conversations already about how migrations are run within test suites in various openstack projects.   I’m kind of surprised by this approach of dropping and recreating the whole database for individual tests.   Running tests in parallel is obviously made very difficult by this style, but even beyond that, a lot of databases don’t respond well to lots of dropping/rebuilding of tables and/or databases in any case; while SQLite and MySQL are probably the most forgiving of this, a backend like Postgresql is going to lock tables from being dropped more aggressively, if any open transactions or result cursors against those tables remain, and on a backend like Oracle, the speed of schema operations starts to become prohibitively slow.   Dropping and creating tables is in general not a very speedy task on any backend, and on a test suite that runs many tests against a fixed schema, I don’t see why a full drop is necessary.

If you look at SQLAlchemy’s own tests, they do in fact create tables on each test, or just as often for a specific suite of tests.  However, this is due to the fact that SQLAlchemy tests are testing SQLAlchemy itself, so the database schemas used for these tests are typically built explicitly for small groups or individual tests, and there are ultimately thousands of small “mini schemas” built up and torn down for these tests.   A lot of framework code is involved within the test suite to keep more picky databases like Postgresql and Oracle happy when building up and dropping tables so frequently.

However, when testing an application that uses a fixed set of tables, as should be the case for the majority if not all Openstack apps, there’s no reason that these tables need to be recreated for every test.   Typically, the way I recommend is that the test suite includes a “per suite” activity which creates the test schema just once (with or without using CREATE DATABASE; I’m not a fan of tests running CREATE DATABASE as this is not a command so easily available in some environments).   The tests themselves then run within a transactional container, such that each test performs all of its work within a context that doesn’t actually commit any data to the database; a test that actually states “session.commit()” runs within a container that doesn’t actually emit the COMMIT, and if support is needed for tests that also emit “session.rollback()”, the container can be written to support this paradigm as well (I helped some Dropbox devs with such an approach recently).

In this way, the database migrations are exercised, but only once at the beginning in order to build up the schema; the tests can then run with very low complexity/performance overhead as far as database-level setup/teardown, and parallel testing is also much easier.   When the test suite completes is when a drop of the entire set of tables can proceed.  Because tests are run within transactions, assuming READ COMMITTED isolation is established, the tests don’t even see the data being incurred by other tests running in parallel.

It remains to be seen what aspects of Openstack I’m going to get involved with first, though this migration and testing issue seems to be a big one.   I’d love to get comments from the community here as to how this process might be improved and if a rearrangement of the fixture system in the way I describe might be helpful and feasible.

- mike


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140607/bba7ad54/attachment.html>


More information about the OpenStack-dev mailing list