<div dir="ltr">Hi all, <div><br></div><div style>I would like to explain very high level steps of our work: </div><div style>1) Sync work with DB in all projects (We have what we have, let it be in one place)<br></div><div style>
2) Refactor work with DB in one place (not independently in all projects) </div><div style><br></div><div style>So I understand that our code around DB is not ideal, but let it be in one place at first.</div><div style><br>
</div><div style>----------</div><div style>About DB archiving. <br></div><div style>----------</div><div style>Let me describe how it works for non familiar contributors:<br></div><div style><br></div><div style>For each table (that have columns, indexes, unique constraints, fk and etc) we have shadow table that have only columns (without indexes, unique constraints, fk..)</div>
<div style><br></div><div style>And then we have utility that makes next things: </div><div style>"move from original table records (that are marked as "deleted") to shadow"</div><div style><br></div><div style>
This was done by David Ripton in Nova in Grizzly. </div><div style><br></div><div style>-----</div><div style><br></div><div style>After a few months I found that there are tons of migrations for "original" table and there is no migration for "shadow table". </div>
<div style>And implement this BP <a href="https://blueprints.launchpad.net/nova/+spec/db-improve-archiving">https://blueprints.launchpad.net/nova/+spec/db-improve-archiving</a> that makes next:</div><div style>a) sync shadow tables with original</div>
<div style>b) add test that checks that:</div><div style> 1) for each "original" table we have shadow</div><div style> 2) we don't have extra shadow tables</div><div style> 3) shadow tables have same columns as "original"</div>
<div style><br></div><div style>Why is this so important: <br></div><div style>1) If "shadow" and "original" table are not synced there could be 2 results after shadow util was ran:</div><div style> a) it will fail</div>
<div style> b) (worst) it will break data in shadow table</div><div style><br></div><div style>------</div><div style><br></div><div style>Also there is no exponential growth of JOINs when we are using shadow tables: </div>
<div style><br></div><div style>In migrations we should:<br></div><div style>a) Do the same actions on columns (drop, alter) in main and shadow</div><div style>b) Do the same actions on tables (create/drop/rename)</div><div style>
c) Do the same actions on data in Tables </div><div style><br></div><div style>So you are doing separately actions on Main tables and Shadow tables, but after migration our tables should be synced.</div><div style><br></div>
<div style>And it is easier to make the same actions 2 times on "main" and "shadow" table in one migration then in separated migrations. </div><div style><br></div><div style>-----</div><div style><br>
</div><div style>About the db_sync "downtime" (upgrading from one to another DB version) (IRC)<br></div><div style><br></div><div style>DB Archiving just help us to reduce this time. One of possible variant (high level): </div>
<div style>1) Move to shadow_tables our "deleted" rows</div><div style>2) Copy shadow_tables from schema -> to tmp_schema</div><div style>3) Drop data from shadow_tables</div><div style>4) Make migrations on schema: </div>
<div style>a) As shadow tables are empty all migrations will be done really fast </div><div style>b) As our original tables (have) only non "deleted" rows migration will be done also much faster.</div><div style>
5) Run Nova</div><div style>6) Make migration on tmp_schema</div><div style>7) Copy from tmp_schema to shcema (if it is required for some reasons) </div><div style><br></div><div style>So for example writing utitlites that will be able to do this will be very useful. <br>
</div><div style>------</div><div style><br></div><div style><br></div><div style>So what I think about DB archiving. </div><div style>It is great things that helps us: </div><div style>1) to reduce migrations downtime</div>
<div style>2) to reduce count of rows in original table and improve performance</div><div style><br></div><div style>And I think that tests that checks that "original" and "shadow" tables are synces is required here. </div>
<div style><br></div><div style><br></div><div style>Best regards,</div><div style>Boris Pavlovic</div><div style><br></div><div style><br></div><div style><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Fri, Jul 5, 2013 at 3:41 PM, Nikola Đipanov <span dir="ltr"><<a href="mailto:ndipanov@redhat.com" target="_blank">ndipanov@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div class="im">On 02/07/13 19:50, Boris Pavlovic wrote:<br>
><br>
> *) DB Archiving<br>
> a) create shadow tables<br>
> b) add tests that checks that shadow and main table are synced.<br>
> c) add code that work with shadow tables.<br>
><br>
<br>
</div>Hi Boris & all,<br>
<br>
I have a few points regarding db archiving work that I am growing more<br>
concerned about, so I though I might mention them on this thread. I<br>
pointed them out ad-hoc on a recent review<br>
<a href="https://review.openstack.org/#/c/34643/" target="_blank">https://review.openstack.org/#/c/34643/</a> and there is some discussion<br>
there already, although was not very fruitful.<br>
<br>
I feel that there were a few design oversights and as a result it has a<br>
couple of rough edges I noticed.<br>
<br>
First issue is about the fact that shadow tables do not present a "view<br>
of the world" themselves but are just unconstrained rows copied from<br>
live tables.<br>
<br>
This is understandably done for performance reasons while archiving<br>
(with current design ideas in place), but also causes issues when<br>
migrations affect more than one table. Especially if data migrations<br>
need to look at more tables at once, the actual number of table joins<br>
needed in order to consider everything grows exponentially. It could be<br>
argued that these are not that common, but is something that will make<br>
development more difficult and migrations painful once it comes up.<br>
<br>
To put it shortly - this property generally makes it harder to reason<br>
about data.<br>
<br>
Second point (and it ties in with the first one since it makes it<br>
difficult to fix) - Maybe shadow table migrations should be kept<br>
separate, and made optional? Currently there is a check that will fail<br>
the tests unless the migration is done on both tables, which I think<br>
should be removed in favour of separate migrations. Developers should<br>
still migrate both of course - but deployers should be able to choose<br>
not to do it according to their needs/scale. I am sure there are people<br>
on this list that can chip in more on this subject (I've had a brief<br>
discussion with lifeless on this topic on IRC).<br>
<br>
I'm afraid that if you agree that these are in fact problems - you might<br>
also agree that we might want to go back on some of the design decisions<br>
made around db archiving (like having unconstrained tables in the same<br>
db for example).<br>
<br>
I'd be happy to hear some of the angles that I may have missed,<br>
<br>
Cheers,<br>
<br>
Nikola<br>
</blockquote></div><br></div></div>