Open Stack

Fri Jul 5 12:26:36 UTC 2013

Hi all,

I would like to explain very high level steps of our work:
1) Sync work with DB in all projects (We have what we have, let it be in
one place)
2) Refactor work with DB in one place (not independently in all projects)

So I understand that our code around DB is not ideal, but let it be in one
place at first.

----------
About DB archiving.
----------
Let me describe how it works for non familiar contributors:

For each table (that have columns, indexes, unique constraints, fk and etc)
we have shadow table that have only columns (without indexes, unique
constraints, fk..)

And then we have utility that makes next things:
"move from original table records (that are marked as "deleted") to shadow"

This was done by David Ripton in Nova in Grizzly.

-----

After a few months I found that there are tons of migrations for "original"
table and there is no migration for "shadow table".
And implement this BP
https://blueprints.launchpad.net/nova/+spec/db-improve-archiving that makes
next:
a) sync shadow tables with original
b) add test that checks that:
  1) for each "original" table we have shadow
  2) we don't have extra shadow tables
  3) shadow tables have same columns as "original"

Why is this so important:
1) If "shadow" and "original" table are not synced there could be 2 results
after shadow util was ran:
  a) it will fail
  b) (worst) it will break data in shadow table

------

Also there is no exponential growth of JOINs when we are using shadow
tables:

In migrations we should:
a) Do the same actions on columns (drop, alter) in main and shadow
b) Do the same actions on tables (create/drop/rename)
c) Do the same actions on data in Tables

So you are doing separately actions on Main tables and Shadow tables, but
after migration our tables should be synced.

And it is easier to make the same actions 2 times on "main" and "shadow"
table in one migration then in separated migrations.

-----

About the db_sync "downtime" (upgrading from one to another DB version)
(IRC)

DB Archiving just help us to reduce this time. One of possible variant
(high level):
1) Move to shadow_tables our "deleted" rows
2) Copy shadow_tables from schema -> to tmp_schema
3) Drop data from shadow_tables
4) Make migrations on schema:
a) As shadow tables are empty all migrations will be done really fast
b) As our original tables (have) only non "deleted" rows migration will be
done also much faster.
5) Run Nova
6) Make migration on tmp_schema
7) Copy from tmp_schema to shcema (if it is required for some reasons)

So for example writing utitlites that will be able to do this will be very
useful.
------

So what I think about DB archiving.
It is great things that helps us:
1) to reduce migrations downtime
2) to reduce count of rows in original table and improve performance

And I think that tests that checks that "original" and "shadow" tables are
synces is required here.

Best regards,
Boris Pavlovic

On Fri, Jul 5, 2013 at 3:41 PM, Nikola Đipanov <ndipanov at redhat.com> wrote:

> On 02/07/13 19:50, Boris Pavlovic wrote:
> >
> >   *) DB Archiving
> >      a) create shadow tables
> >      b) add tests that checks that shadow and main table are synced.
> >      c) add code that work with shadow tables.
> >
>
> Hi Boris & all,
>
> I have a few points regarding db archiving work that I am growing more
> concerned about, so I though I might mention them on this thread. I
> pointed them out ad-hoc on a recent review
> https://review.openstack.org/#/c/34643/ and there is some discussion
> there already, although was not very fruitful.
>
> I feel that there were a few design oversights and as a result it has a
> couple of rough edges I noticed.
>
> First issue is about the fact that shadow tables do not present a "view
> of the world" themselves but are just unconstrained rows copied from
> live tables.
>
> This is understandably done for performance reasons while archiving
> (with current design ideas in place), but also causes issues when
> migrations affect more than one table. Especially if data migrations
> need to look at more tables at once, the actual number of table joins
> needed in order to consider everything grows exponentially. It could be
> argued that these are not that common, but is something that will make
> development more difficult and migrations painful once it comes up.
>
> To put it shortly - this property generally makes it harder to reason
> about data.
>
> Second point (and it ties in with the first one since it makes it
> difficult to fix) - Maybe shadow table migrations should be kept
> separate, and made optional? Currently there is a check that will fail
> the tests unless the migration is done on both tables, which I think
> should be removed in favour of separate migrations. Developers should
> still migrate both of course - but deployers should be able to choose
> not to do it according to their needs/scale. I am sure there are people
> on this list that can chip in more on this subject (I've had a brief
> discussion with lifeless on this topic on IRC).
>
> I'm afraid that if you agree that these are in fact problems - you might
> also agree that we might want to go back on some of the design decisions
> made around db archiving (like having unconstrained tables in the same
> db for example).
>
> I'd be happy to hear some of the angles that I may have missed,
>
> Cheers,
>
> Nikola
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130705/dd80bb68/attachment.html>

Open Stack

[openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)

OpenStack

Community

Documentation

Branding & Legal