[openstack-dev] [nova] if by "archived" you mean, "wipes out your tables completely", then sure, it works fine

Mike Bayer mbayer at redhat.com
Thu Mar 12 23:29:55 UTC 2015


Hello Nova -

Not sure if I’m just staring at this for too long, or if
archive_deleted_rows_for_table() is just not something we ever use.  
Because it looks like it’s really, really broken very disastrously, and I’m 
wondering if I’m just missing something in front of me.

Let’s look at what it does!

First, archive_deleted_rows() calls it with a table name. These names are
taken by collecting every single table name from nova.db.sqlalchemy.models.

Then, the function uses table reflection (that is, doesn’t look in the model
at all, just goes right to the database) to load the table definitions:

    table = Table(tablename, metadata, autoload=True)
    shadow_tablename = _SHADOW_TABLE_PREFIX + tablename
    rows_archived = 0
    try:
        shadow_table = Table(shadow_tablename, metadata, autoload=True)
    except NoSuchTableError:
        # No corresponding shadow table; skip it.
        return rows_archived

this is pretty heavy handed and wasteful from an efficiency point of view,
and I’d like to fix this too, but let’s go with it. Now we have the two
tables.

Then we do this:

    deleted_column = table.c.deleted
    query_insert = sql.select([table],
                          deleted_column != deleted_column.default).\
                          order_by(column).limit(max_rows)
    query_delete = sql.select([column],
                          deleted_column != deleted_column.default).\
                          order_by(column).limit(max_rows)

We make some SELECT statements that we’re going to use to find “soft
deleted” rows, and these will be embedded into an INSERT 
and a DELETE. It is trying to make a statement like “SELECT .. FROM
table WHERE deleted != <deleted_default>”, so that it finds rows where
“deleted” has been changed to something, e.g. the row was 
soft deleted.

But what’s the value of “deleted_default” ?   Remember, all this
table knows is what the database just told us about it, because it only
uses reflection.  Let’s see what the “deleted” column in a table like 
instance_types looks like:

MariaDB [nova]> show create table instance_types;
| instance_types | CREATE TABLE `instance_types` (
  `created_at` datetime DEFAULT NULL,

  …  [omitted] ...

  `deleted` int(11) DEFAULT NULL,
)

The default that we get for this column is NULL. That is very interesting!
Because, if we look at the *Python-side value of deleted*, we see something
that is quite the opposite of NULL, e.g. a thing that is most certainly not
null:

class SoftDeleteMixin(object):
    deleted_at = Column(DateTime)
    deleted = Column(Integer, default=0)

See that zero there? That’s a ***Python-side default***. It is **not the
server default**!! You will **not** get it from reflection, the database has
no clue about it (oddly enough, this entire subject matter is fully
documented in SQLAlchemy’s documentation, and guess what, the docs are free!
Read them all you like, I won’t ask for a dime, no questions asked!).

So, all of our INSERTS **will** put a zero, not NULL, into that column.
Let’s look in instance_types and see:

MariaDB [nova]> select id, name, deleted from instance_types;
+----+-----------+---------+
| id | name      | deleted |
+----+-----------+---------+
|  3 | m1.large  |       0 |
|  1 | m1.medium |       0 |
|  7 | m1.micro  |       0 |
|  6 | m1.nano   |       0 |
|  5 | m1.small  |       0 |
|  2 | m1.tiny   |       0 |
|  4 | m1.xlarge |       0 |
+----+-----------+---------+
7 rows in set (0.00 sec)

No NULLs.  The value of non-deleted rows is zero.

What does this all mean?

It means, when this archival routine runs, it runs queries like this:

INSERT INTO shadow_quota_usages SELECT quota_usages.created_at, quota_usages.updated_at, quota_usages.deleted_at, quota_usages.id, quota_usages.project_id, quota_usages.resource, quota_usages.in_use, quota_usages.reserved, quota_usages.until_refresh, quota_usages.deleted, quota_usages.user_id
FROM quota_usages
WHERE quota_usages.deleted IS NOT NULL ORDER BY quota_usages.id
 LIMIT ? OFFSET ?
2015-03-12 17:01:01,218 INFO [sqlalchemy.engine.base.Engine] (7, 0)
2015-03-12 17:01:01,219 INFO [sqlalchemy.engine.base.Engine] DELETE FROM quota_usages WHERE quota_usages.id in (SELECT T1.id FROM (SELECT quota_usages.id
FROM quota_usages
WHERE quota_usages.deleted IS NOT NULL ORDER BY quota_usages.id
 LIMIT ? OFFSET ?) as T1)

The second query is to DELETE rows from a table like quota_usages based on
looking at rows where the “deleted” column is “NOT NULL”. Which means, all
of them! They are all zeros, not NULL!

If we assume that all of our tables are filled up with zeroes for those
deleted columns, because that’s the default, this **wipes the whole table
clean**.

How do the tests pass? Well the tests are in test_db_api->ArchiveTestCase,
and actually, they don’t. But they don’t fail every time, because the test
suite here runs with a database that is almost completely empty anyway, so
the broken archival routine doesn’t find too many rows to blow away except
for the rows in “instance_types”, which it only finds sometimes because the
tests are only running it with a small number of things to delete and the
order of the tables is non-deterministic.

I’ve posted the bug report at https://bugs.launchpad.net/nova/+bug/1431571
where I started out not knowing much about how this worked except that my
tests were failing, and slowly stumbled my way to come to this conclusion. A
patch is at https://review.openstack.org/#/c/164009/, where we look at the
actual Python-side default. However I’d recommend that we just hardcode the
zero here, since that’s how our soft-delete columns work.




More information about the OpenStack-dev mailing list