[openstack-dev] [Oslo] [Ironic] DB migration woes
mbayer at redhat.com
Mon Jun 9 17:08:25 UTC 2014
On Jun 9, 2014, at 12:50 PM, Devananda van der Veen <devananda.vdv at gmail.com> wrote:
> There may be some problems with MySQL when testing parallel writes in
> different non-committing transactions, even in READ COMMITTED mode,
> due to InnoDB locking, if the queries use non-unique secondary indexes
> for UPDATE or SELECT..FOR UPDATE queries. This is done by the
> "with_lockmode('update')" SQLAlchemy phrase, and is used in ~10 places
> in Nova. So I would not recommend this approach, even though, in
> principle, I agree it would be a much more efficient way of testing
> database reads/writes.
> More details here:
> http://dev.mysql.com/doc/refman/5.5/en/innodb-locks-set.html and
OK, but just to clarify my understanding, what is the approach to testing writes in parallel right now, are we doing CREATE DATABASE for two entirely distinct databases with some kind of generated name for each one? Otherwise, if the parallel tests are against the same database, this issue exists regardless (unless autocommit mode is used, is FOR UPDATE accepted under those conditions?)
> On Sun, Jun 8, 2014 at 8:46 AM, Roman Podoliaka <rpodolyaka at mirantis.com> wrote:
>> Hi Mike,
>>>>> However, when testing an application that uses a fixed set of tables, as should be the case for the majority if not all Openstack apps, there’s no reason that these tables need to be recreated for every test.
>> This is a very good point. I tried to use the recipe from SQLAlchemy
>> docs to run Nova DB API tests (yeah, I know, this might sound
>> confusing, but these are actually methods that access the database in
>> Nova) on production backends (MySQL and PostgreSQL). The abandoned
>> patch is here . Julia Varlamova has been working on rebasing this
>> on master and should upload a new patch set soon.
>> Overall, the approach with executing a test within a transaction and
>> then emitting ROLLBACK worked quite well. The only problem I ran into
>> were tests doing ROLLBACK on purpose. But you've updated the recipe
>> since then and this can probably be solved by using of save points. I
>> used a separate DB per a test running process to prevent race
>> conditions, but we should definitely give READ COMMITTED approach a
>> try. If it works, that will awesome.
>> With a few tweaks of PostgreSQL config I was able to run Nova DB API
>> tests in 13-15 seconds, while SQLite in memory took about 7s.
>> Action items for me and Julia probably:  needs a spec with 
>> updated accordingly. Using of this 'test in a transaction' approach
>> seems to be a way to go for running all db related tests except the
>> ones using DDL statements (as any DDL statement commits the current
>> transaction implicitly on MySQL and SQLite AFAIK).
>>  https://review.openstack.org/#/c/33236/
>>  https://blueprints.launchpad.net/nova/+spec/db-api-tests-on-all-backends
>> On Sat, Jun 7, 2014 at 10:27 PM, Mike Bayer <mbayer at redhat.com> wrote:
>>> On Jun 6, 2014, at 8:12 PM, Devananda van der Veen <devananda.vdv at gmail.com>
>>> I think some things are broken in the oslo-incubator db migration code.
>>> Ironic moved to this when Juno opened and things seemed fine, until recently
>>> when Lucas tried to add a DB migration and noticed that it didn't run... So
>>> I looked into it a bit today. Below are my findings.
>>> Firstly, I filed this bug and proposed a fix, because I think that tests
>>> that don't run any code should not report that they passed -- they should
>>> report that they were skipped.
>>> "No notice given when db migrations are not run due to missing engine"
>>> Then, I edited the test_migrations.conf file appropriately for my local
>>> mysql service, ran the tests again, and verified that migration tests ran --
>>> and they passed. Great!
>>> Now, a little background... Ironic's TestMigrations class inherits from
>>> oslo's BaseMigrationTestCase, then "opportunistically" checks each back-end,
>>> if it's available. This opportunistic checking was inherited from Nova so
>>> that tests could pass on developer workstations where not all backends are
>>> present (eg, I have mysql installed, but not postgres), and still
>>> transparently run on all backends in the gate. I couldn't find such
>>> opportunistic testing in the oslo db migration test code, unfortunately -
>>> but maybe it's well hidden.
>>> Anyhow. When I stopped the local mysql service (leaving the configuration
>>> unchanged), I expected the tests to be skipped, but instead I got two
>>> surprise failures:
>>> - test_mysql_opportunistically() failed because setUp() raises an exception
>>> before the test code could call calling _have_mysql()
>>> - test_mysql_connect_fail() actually failed! Again, because setUp() raises
>>> an exception before running the test itself
>>> Unfortunately, there's one more problem... when I run the tests in parallel,
>>> they fail randomly because sometimes two test threads run different
>>> migration tests, and the setUp() for one thread (remember, it calls
>>> _reset_databases) blows up the other test.
>>> Out of 10 runs, it failed three times, each with different errors:
>>> NoSuchTableError: `chassis`
>>> ERROR 1007 (HY000) at line 1: Can't create database 'test_migrations';
>>> database exists
>>> ProgrammingError: (ProgrammingError) (1146, "Table
>>> 'test_migrations.alembic_version' doesn't exist")
>>> As far as I can tell, this is all coming from:
>>> Hello -
>>> Just an introduction, I’m Mike Bayer, the creator of SQLAlchemy and Alembic
>>> migrations. I’ve just joined on as a full time Openstack contributor,
>>> and trying to help improve processes such as these is my primary
>>> I’ve had several conversations already about how migrations are run within
>>> test suites in various openstack projects. I’m kind of surprised by this
>>> approach of dropping and recreating the whole database for individual tests.
>>> Running tests in parallel is obviously made very difficult by this style,
>>> but even beyond that, a lot of databases don’t respond well to lots of
>>> dropping/rebuilding of tables and/or databases in any case; while SQLite and
>>> MySQL are probably the most forgiving of this, a backend like Postgresql is
>>> going to lock tables from being dropped more aggressively, if any open
>>> transactions or result cursors against those tables remain, and on a backend
>>> like Oracle, the speed of schema operations starts to become prohibitively
>>> slow. Dropping and creating tables is in general not a very speedy task on
>>> any backend, and on a test suite that runs many tests against a fixed
>>> schema, I don’t see why a full drop is necessary.
>>> If you look at SQLAlchemy’s own tests, they do in fact create tables on each
>>> test, or just as often for a specific suite of tests. However, this is due
>>> to the fact that SQLAlchemy tests are testing SQLAlchemy itself, so the
>>> database schemas used for these tests are typically built explicitly for
>>> small groups or individual tests, and there are ultimately thousands of
>>> small “mini schemas” built up and torn down for these tests. A lot of
>>> framework code is involved within the test suite to keep more picky
>>> databases like Postgresql and Oracle happy when building up and dropping
>>> tables so frequently.
>>> However, when testing an application that uses a fixed set of tables, as
>>> should be the case for the majority if not all Openstack apps, there’s no
>>> reason that these tables need to be recreated for every test. Typically,
>>> the way I recommend is that the test suite includes a “per suite” activity
>>> which creates the test schema just once (with or without using CREATE
>>> DATABASE; I’m not a fan of tests running CREATE DATABASE as this is not a
>>> command so easily available in some environments). The tests themselves
>>> then run within a transactional container, such that each test performs all
>>> of its work within a context that doesn’t actually commit any data to the
>>> database; a test that actually states “session.commit()” runs within a
>>> container that doesn’t actually emit the COMMIT, and if support is needed
>>> for tests that also emit “session.rollback()”, the container can be written
>>> to support this paradigm as well (I helped some Dropbox devs with such an
>>> approach recently).
>>> In this way, the database migrations are exercised, but only once at the
>>> beginning in order to build up the schema; the tests can then run with very
>>> low complexity/performance overhead as far as database-level setup/teardown,
>>> and parallel testing is also much easier. When the test suite completes is
>>> when a drop of the entire set of tables can proceed. Because tests are run
>>> within transactions, assuming READ COMMITTED isolation is established, the
>>> tests don’t even see the data being incurred by other tests running in
>>> It remains to be seen what aspects of Openstack I’m going to get involved
>>> with first, though this migration and testing issue seems to be a big one.
>>> I’d love to get comments from the community here as to how this process
>>> might be improved and if a rearrangement of the fixture system in the way I
>>> describe might be helpful and feasible.
>>> - mike
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
More information about the OpenStack-dev