[openstack-dev] [Oslo] [Ironic] DB migration woes
Jay Pipes
jaypipes at gmail.com
Mon Jun 9 19:05:45 UTC 2014
On 06/09/2014 02:57 PM, Devananda van der Veen wrote:
> On Mon, Jun 9, 2014 at 10:49 AM, Jay Pipes <jaypipes at gmail.com> wrote:
>> On 06/09/2014 12:50 PM, Devananda van der Veen wrote:
>>>
>>> There may be some problems with MySQL when testing parallel writes in
>>> different non-committing transactions, even in READ COMMITTED mode,
>>> due to InnoDB locking, if the queries use non-unique secondary indexes
>>> for UPDATE or SELECT..FOR UPDATE queries. This is done by the
>>> "with_lockmode('update')" SQLAlchemy phrase, and is used in ~10 places
>>> in Nova. So I would not recommend this approach, even though, in
>>> principle, I agree it would be a much more efficient way of testing
>>> database reads/writes.
>>>
>>> More details here:
>>> http://dev.mysql.com/doc/refman/5.5/en/innodb-locks-set.html and
>>> http://dev.mysql.com/doc/refman/5.5/en/innodb-record-level-locks.html
>>
>>
>> Hi Deva,
>>
>> MySQL/InnoDB's default isolation mode is REPEATABLE_READ, not
>> READ_COMMITTED... are you saying that somewhere in the Ironic codebase we
>> are setting the isolation mode manually to READ_COMMITTED for some reason?
>>
>> Best,
>> -jay
>>
>
> Jay,
>
> Not saying that at all. I was responding to Mike's suggested approach
> for testing DB changes (which was actually off topic from my original
> post), in which he suggested using READ_COMMITTED.
Apologies, thx for the clarification, Deva,
-jay
> -Deva
>
>>
>>> On Sun, Jun 8, 2014 at 8:46 AM, Roman Podoliaka <rpodolyaka at mirantis.com>
>>> wrote:
>>>>
>>>> Hi Mike,
>>>>
>>>>>>> However, when testing an application that uses a fixed set of tables,
>>>>>>> as should be the case for the majority if not all Openstack apps, there’s no
>>>>>>> reason that these tables need to be recreated for every test.
>>>>
>>>>
>>>> This is a very good point. I tried to use the recipe from SQLAlchemy
>>>> docs to run Nova DB API tests (yeah, I know, this might sound
>>>> confusing, but these are actually methods that access the database in
>>>> Nova) on production backends (MySQL and PostgreSQL). The abandoned
>>>> patch is here [1]. Julia Varlamova has been working on rebasing this
>>>> on master and should upload a new patch set soon.
>>>>
>>>> Overall, the approach with executing a test within a transaction and
>>>> then emitting ROLLBACK worked quite well. The only problem I ran into
>>>> were tests doing ROLLBACK on purpose. But you've updated the recipe
>>>> since then and this can probably be solved by using of save points. I
>>>> used a separate DB per a test running process to prevent race
>>>> conditions, but we should definitely give READ COMMITTED approach a
>>>> try. If it works, that will awesome.
>>>>
>>>> With a few tweaks of PostgreSQL config I was able to run Nova DB API
>>>> tests in 13-15 seconds, while SQLite in memory took about 7s.
>>>>
>>>> Action items for me and Julia probably: [2] needs a spec with [1]
>>>> updated accordingly. Using of this 'test in a transaction' approach
>>>> seems to be a way to go for running all db related tests except the
>>>> ones using DDL statements (as any DDL statement commits the current
>>>> transaction implicitly on MySQL and SQLite AFAIK).
>>>>
>>>> Thanks,
>>>> Roman
>>>>
>>>> [1] https://review.openstack.org/#/c/33236/
>>>> [2]
>>>> https://blueprints.launchpad.net/nova/+spec/db-api-tests-on-all-backends
>>>>
>>>> On Sat, Jun 7, 2014 at 10:27 PM, Mike Bayer <mbayer at redhat.com> wrote:
>>>>>
>>>>>
>>>>> On Jun 6, 2014, at 8:12 PM, Devananda van der Veen
>>>>> <devananda.vdv at gmail.com>
>>>>> wrote:
>>>>>
>>>>> I think some things are broken in the oslo-incubator db migration code.
>>>>>
>>>>> Ironic moved to this when Juno opened and things seemed fine, until
>>>>> recently
>>>>> when Lucas tried to add a DB migration and noticed that it didn't run...
>>>>> So
>>>>> I looked into it a bit today. Below are my findings.
>>>>>
>>>>> Firstly, I filed this bug and proposed a fix, because I think that tests
>>>>> that don't run any code should not report that they passed -- they
>>>>> should
>>>>> report that they were skipped.
>>>>> https://bugs.launchpad.net/oslo/+bug/1327397
>>>>> "No notice given when db migrations are not run due to missing
>>>>> engine"
>>>>>
>>>>> Then, I edited the test_migrations.conf file appropriately for my local
>>>>> mysql service, ran the tests again, and verified that migration tests
>>>>> ran --
>>>>> and they passed. Great!
>>>>>
>>>>> Now, a little background... Ironic's TestMigrations class inherits from
>>>>> oslo's BaseMigrationTestCase, then "opportunistically" checks each
>>>>> back-end,
>>>>> if it's available. This opportunistic checking was inherited from Nova
>>>>> so
>>>>> that tests could pass on developer workstations where not all backends
>>>>> are
>>>>> present (eg, I have mysql installed, but not postgres), and still
>>>>> transparently run on all backends in the gate. I couldn't find such
>>>>> opportunistic testing in the oslo db migration test code, unfortunately
>>>>> -
>>>>> but maybe it's well hidden.
>>>>>
>>>>> Anyhow. When I stopped the local mysql service (leaving the
>>>>> configuration
>>>>> unchanged), I expected the tests to be skipped, but instead I got two
>>>>> surprise failures:
>>>>> - test_mysql_opportunistically() failed because setUp() raises an
>>>>> exception
>>>>> before the test code could call calling _have_mysql()
>>>>> - test_mysql_connect_fail() actually failed! Again, because setUp()
>>>>> raises
>>>>> an exception before running the test itself
>>>>>
>>>>> Unfortunately, there's one more problem... when I run the tests in
>>>>> parallel,
>>>>> they fail randomly because sometimes two test threads run different
>>>>> migration tests, and the setUp() for one thread (remember, it calls
>>>>> _reset_databases) blows up the other test.
>>>>>
>>>>> Out of 10 runs, it failed three times, each with different errors:
>>>>> NoSuchTableError: `chassis`
>>>>> ERROR 1007 (HY000) at line 1: Can't create database
>>>>> 'test_migrations';
>>>>> database exists
>>>>> ProgrammingError: (ProgrammingError) (1146, "Table
>>>>> 'test_migrations.alembic_version' doesn't exist")
>>>>>
>>>>> As far as I can tell, this is all coming from:
>>>>>
>>>>>
>>>>> https://github.com/openstack/oslo-incubator/blob/master/openstack/common/db/sqlalchemy/test_migrations.py#L86;L111
>>>>>
>>>>>
>>>>> Hello -
>>>>>
>>>>> Just an introduction, I’m Mike Bayer, the creator of SQLAlchemy and
>>>>> Alembic
>>>>> migrations. I’ve just joined on as a full time Openstack
>>>>> contributor,
>>>>> and trying to help improve processes such as these is my primary
>>>>> responsibility.
>>>>>
>>>>> I’ve had several conversations already about how migrations are run
>>>>> within
>>>>> test suites in various openstack projects. I’m kind of surprised by
>>>>> this
>>>>> approach of dropping and recreating the whole database for individual
>>>>> tests.
>>>>> Running tests in parallel is obviously made very difficult by this
>>>>> style,
>>>>> but even beyond that, a lot of databases don’t respond well to lots of
>>>>> dropping/rebuilding of tables and/or databases in any case; while SQLite
>>>>> and
>>>>> MySQL are probably the most forgiving of this, a backend like Postgresql
>>>>> is
>>>>> going to lock tables from being dropped more aggressively, if any open
>>>>> transactions or result cursors against those tables remain, and on a
>>>>> backend
>>>>> like Oracle, the speed of schema operations starts to become
>>>>> prohibitively
>>>>> slow. Dropping and creating tables is in general not a very speedy
>>>>> task on
>>>>> any backend, and on a test suite that runs many tests against a fixed
>>>>> schema, I don’t see why a full drop is necessary.
>>>>>
>>>>> If you look at SQLAlchemy’s own tests, they do in fact create tables on
>>>>> each
>>>>> test, or just as often for a specific suite of tests. However, this is
>>>>> due
>>>>> to the fact that SQLAlchemy tests are testing SQLAlchemy itself, so the
>>>>> database schemas used for these tests are typically built explicitly for
>>>>> small groups or individual tests, and there are ultimately thousands of
>>>>> small “mini schemas” built up and torn down for these tests. A lot of
>>>>> framework code is involved within the test suite to keep more picky
>>>>> databases like Postgresql and Oracle happy when building up and dropping
>>>>> tables so frequently.
>>>>>
>>>>> However, when testing an application that uses a fixed set of tables, as
>>>>> should be the case for the majority if not all Openstack apps, there’s
>>>>> no
>>>>> reason that these tables need to be recreated for every test.
>>>>> Typically,
>>>>> the way I recommend is that the test suite includes a “per suite”
>>>>> activity
>>>>> which creates the test schema just once (with or without using CREATE
>>>>> DATABASE; I’m not a fan of tests running CREATE DATABASE as this is not
>>>>> a
>>>>> command so easily available in some environments). The tests
>>>>> themselves
>>>>> then run within a transactional container, such that each test performs
>>>>> all
>>>>> of its work within a context that doesn’t actually commit any data to
>>>>> the
>>>>> database; a test that actually states “session.commit()” runs within a
>>>>> container that doesn’t actually emit the COMMIT, and if support is
>>>>> needed
>>>>> for tests that also emit “session.rollback()”, the container can be
>>>>> written
>>>>> to support this paradigm as well (I helped some Dropbox devs with such
>>>>> an
>>>>> approach recently).
>>>>>
>>>>> In this way, the database migrations are exercised, but only once at the
>>>>> beginning in order to build up the schema; the tests can then run with
>>>>> very
>>>>> low complexity/performance overhead as far as database-level
>>>>> setup/teardown,
>>>>> and parallel testing is also much easier. When the test suite
>>>>> completes is
>>>>> when a drop of the entire set of tables can proceed. Because tests are
>>>>> run
>>>>> within transactions, assuming READ COMMITTED isolation is established,
>>>>> the
>>>>> tests don’t even see the data being incurred by other tests running in
>>>>> parallel.
>>>>>
>>>>> It remains to be seen what aspects of Openstack I’m going to get
>>>>> involved
>>>>> with first, though this migration and testing issue seems to be a big
>>>>> one.
>>>>> I’d love to get comments from the community here as to how this process
>>>>> might be improved and if a rearrangement of the fixture system in the
>>>>> way I
>>>>> describe might be helpful and feasible.
>>>>>
>>>>> - mike
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list