[openstack-dev] [nova] if by "archived" you mean, "wipes out your tables completely", then sure, it works fine

Mike Bayer mbayer at redhat.com
Fri Mar 13 16:04:21 UTC 2015



Attila Fazekas <afazekas at redhat.com> wrote:

> The archiving has issues since very long time [1],
> something like this [2] is expected to replace it.


yeah I was thinking of just rewriting the archive routine in Nova to be
reasonable, but I can build this routine into Oslo.db as well as a generic
“move rows with criteria X into tables”. Archiving as it is is mostly
useless if it isn’t considering dependencies between tables
(https://bugs.launchpad.net/nova/+bug/1183523) so the correct approach would
need to consider tables and potentially rows in terms of foreign key
dependency. This is what the unit of work was built to handle. Though I’m
not sure I can make this a generic ORM play since we want to be able to
delete “only N” rows, and it would probably be nice for the system to not
spend its time reading in the entire DB if it is only tasked with a few
dozen rows, so it might need to implement its own mini-unit-of-work system
that works against the same paradigm but specific to this use case.

The simplest case is that we address the archival of tables in order of
foreign key dependency. However, that has two issues in the “generic” sense.
One is that there can be cycles between tables, or a table that refers to
itself has a cycle to itself. So in those cases the archival on a “sort the
tables” basis needs to be broken into a “sort the rows” basis. This is what
SQLAlchemy’s unit of work does and I’d adapt that here.

The other possible, but probably unlikely, issue is that to address this
“generically”, if a row “Table A row 1” is referred to by a “Table B row 2”,
it might not be assumable that it is safe to remove “Table B Row 2” and
*not* “Table A row 1”. The application may rely on both of these rows being
present, and the SQLAlchemy pattern where this is the case is the so-called
“joined table inheritance” case. But the “joined table inheritance” pattern
is actually not very easy to adapt to the “shadow” model so I doubt anyone
is doing that.

> The archiving just move trash to the other side of the desk,
> usually just permanently deleting everything what is deleted
> for more than 7 day is better for everyone.
> 
> For now, maybe just wiping out the shadow tables and the existing nova-mange 
> functionality is better choice. [3]
> 
> [1] https://bugs.launchpad.net/nova/+bug/1305892
> [2] https://blueprints.launchpad.net/nova/+spec/db-purge-engine
> [3]  
> 
> ----- Original Message -----
>> From: "Mike Bayer" <mbayer at redhat.com>
>> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
>> Sent: Friday, March 13, 2015 12:29:55 AM
>> Subject: [openstack-dev] [nova] if by "archived" you mean,	"wipes out your tables completely", then sure, it works
>> fine
>> 
>> Hello Nova -
>> 
>> Not sure if I’m just staring at this for too long, or if
>> archive_deleted_rows_for_table() is just not something we ever use.
>> Because it looks like it’s really, really broken very disastrously, and I’m
>> wondering if I’m just missing something in front of me.
>> 
>> Let’s look at what it does!
>> 
>> First, archive_deleted_rows() calls it with a table name. These names are
>> taken by collecting every single table name from nova.db.sqlalchemy.models.
>> 
>> Then, the function uses table reflection (that is, doesn’t look in the model
>> at all, just goes right to the database) to load the table definitions:
>> 
>>    table = Table(tablename, metadata, autoload=True)
>>    shadow_tablename = _SHADOW_TABLE_PREFIX + tablename
>>    rows_archived = 0
>>    try:
>>        shadow_table = Table(shadow_tablename, metadata, autoload=True)
>>    except NoSuchTableError:
>>        # No corresponding shadow table; skip it.
>>        return rows_archived
>> 
>> this is pretty heavy handed and wasteful from an efficiency point of view,
>> and I’d like to fix this too, but let’s go with it. Now we have the two
>> tables.
>> 
>> Then we do this:
>> 
>>    deleted_column = table.c.deleted
>>    query_insert = sql.select([table],
>>                          deleted_column != deleted_column.default).\
>>                          order_by(column).limit(max_rows)
>>    query_delete = sql.select([column],
>>                          deleted_column != deleted_column.default).\
>>                          order_by(column).limit(max_rows)
>> 
>> We make some SELECT statements that we’re going to use to find “soft
>> deleted” rows, and these will be embedded into an INSERT
>> and a DELETE. It is trying to make a statement like “SELECT .. FROM
>> table WHERE deleted != <deleted_default>”, so that it finds rows where
>> “deleted” has been changed to something, e.g. the row was
>> soft deleted.
>> 
>> But what’s the value of “deleted_default” ?   Remember, all this
>> table knows is what the database just told us about it, because it only
>> uses reflection.  Let’s see what the “deleted” column in a table like
>> instance_types looks like:
>> 
>> MariaDB [nova]> show create table instance_types;
>> | instance_types | CREATE TABLE `instance_types` (
>>  `created_at` datetime DEFAULT NULL,
>> 
>>  …  [omitted] ...
>> 
>>  `deleted` int(11) DEFAULT NULL,
>> )
>> 
>> The default that we get for this column is NULL. That is very interesting!
>> Because, if we look at the *Python-side value of deleted*, we see something
>> that is quite the opposite of NULL, e.g. a thing that is most certainly not
>> null:
>> 
>> class SoftDeleteMixin(object):
>>    deleted_at = Column(DateTime)
>>    deleted = Column(Integer, default=0)
>> 
>> See that zero there? That’s a ***Python-side default***. It is **not the
>> server default**!! You will **not** get it from reflection, the database has
>> no clue about it (oddly enough, this entire subject matter is fully
>> documented in SQLAlchemy’s documentation, and guess what, the docs are free!
>> Read them all you like, I won’t ask for a dime, no questions asked!).
>> 
>> So, all of our INSERTS **will** put a zero, not NULL, into that column.
>> Let’s look in instance_types and see:
>> 
>> MariaDB [nova]> select id, name, deleted from instance_types;
>> +----+-----------+---------+
>> | id | name      | deleted |
>> +----+-----------+---------+
>> |  3 | m1.large  |       0 |
>> |  1 | m1.medium |       0 |
>> |  7 | m1.micro  |       0 |
>> |  6 | m1.nano   |       0 |
>> |  5 | m1.small  |       0 |
>> |  2 | m1.tiny   |       0 |
>> |  4 | m1.xlarge |       0 |
>> +----+-----------+---------+
>> 7 rows in set (0.00 sec)
>> 
>> No NULLs.  The value of non-deleted rows is zero.
>> 
>> What does this all mean?
>> 
>> It means, when this archival routine runs, it runs queries like this:
>> 
>> INSERT INTO shadow_quota_usages SELECT quota_usages.created_at,
>> quota_usages.updated_at, quota_usages.deleted_at, quota_usages.id,
>> quota_usages.project_id, quota_usages.resource, quota_usages.in_use,
>> quota_usages.reserved, quota_usages.until_refresh, quota_usages.deleted,
>> quota_usages.user_id
>> FROM quota_usages
>> WHERE quota_usages.deleted IS NOT NULL ORDER BY quota_usages.id
>> LIMIT ? OFFSET ?
>> 2015-03-12 17:01:01,218 INFO [sqlalchemy.engine.base.Engine] (7, 0)
>> 2015-03-12 17:01:01,219 INFO [sqlalchemy.engine.base.Engine] DELETE FROM
>> quota_usages WHERE quota_usages.id in (SELECT T1.id FROM (SELECT
>> quota_usages.id
>> FROM quota_usages
>> WHERE quota_usages.deleted IS NOT NULL ORDER BY quota_usages.id
>> LIMIT ? OFFSET ?) as T1)
>> 
>> The second query is to DELETE rows from a table like quota_usages based on
>> looking at rows where the “deleted” column is “NOT NULL”. Which means, all
>> of them! They are all zeros, not NULL!
>> 
>> If we assume that all of our tables are filled up with zeroes for those
>> deleted columns, because that’s the default, this **wipes the whole table
>> clean**.
>> 
>> How do the tests pass? Well the tests are in test_db_api->ArchiveTestCase,
>> and actually, they don’t. But they don’t fail every time, because the test
>> suite here runs with a database that is almost completely empty anyway, so
>> the broken archival routine doesn’t find too many rows to blow away except
>> for the rows in “instance_types”, which it only finds sometimes because the
>> tests are only running it with a small number of things to delete and the
>> order of the tables is non-deterministic.
>> 
>> I’ve posted the bug report at https://bugs.launchpad.net/nova/+bug/1431571
>> where I started out not knowing much about how this worked except that my
>> tests were failing, and slowly stumbled my way to come to this conclusion. A
>> patch is at https://review.openstack.org/#/c/164009/, where we look at the
>> actual Python-side default. However I’d recommend that we just hardcode the
>> zero here, since that’s how our soft-delete columns work.
>> 
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list