Open Stack

Tue Mar 26 22:24:58 UTC 2013

On 03/26/2013 02:41 PM, Sandy Walsh wrote:
> In a previous thread we were discussing the possibility of using a
> "metadata" schema model to keep the complexity of the database down.
> 
> http://lists.openstack.org/pipermail/openstack-dev/2013-January/005132.html
> 
> This is a very attractive approach because it's so simple, but would it
> be able to handle the volume of data we're contemplating?
> 
> (Disclaimer: I'm not much of db guy so I could have missed something
> obvious here)
> 
> So ... I created a simple set of tables like this:
> 
> mysql> show tables;
> +-------------------+
> | Tables_in_cm_meta |
> +-------------------+
> | rawdata           |
> | traits            |
> +-------------------+
> 2 rows in set (0.00 sec)
> 
> 
> mysql> show columns from traits;
> +------------+-------------+------+-----+---------+----------------+
> | Field      | Type        | Null | Key | Default | Extra          |
> +------------+-------------+------+-----+---------+----------------+
> | id         | int(11)     | NO   | PRI | NULL    | auto_increment |
> | key        | varchar(32) | YES  | MUL | NULL    |                |
> | value      | varchar(32) | YES  | MUL | NULL    |                |
> | rawdata_id | int(11)     | YES  | MUL | NULL    |                |
> +------------+-------------+------+-----+---------+----------------+
> 4 rows in set (0.00 sec)
> 
> 
> mysql> show columns from rawdata;
> +-------+---------+------+-----+---------+----------------+
> | Field | Type    | Null | Key | Default | Extra          |
> +-------+---------+------+-----+---------+----------------+
> | id    | int(11) | NO   | PRI | NULL    | auto_increment |
> | raw   | text    | YES  |     | NULL    |                |
> +-------+---------+------+-----+---------+----------------+
> 2 rows in set (0.00 sec)
> 
> 
> All the traits columns are indexed. Foreign Key to RawData.id primary key.
> 
> Note: Yes, the traits table can be normalized further with the labels in
> another table and only the id's used for the joins, but this is a close
> enough test to the stacktach baseline. (Although if someone thinks this
> will result in a *radically* better performance time, please let me
> know. Include the time to check for duplicates for each unique trait key.)
> 
> I populated the rawdata table with 10000 rows ... each row had 15 traits
> associated with it.
> 
> So about 150k rows in the traits table for the 10k rawdata rows.
> 
> Next I created a basic query to return all the UUID's for events within
> a small time range.
> 
> mysql> SELECT DISTINCT traits.value AS traits_value
>        FROM traits
>        WHERE traits.`key` = 'instance' AND
>              traits.rawdata_id IN (
>                  SELECT DISTINCT
>                      traits.rawdata_id AS traits_rawdata_id
>                  FROM traits
>                  WHERE traits.`key` = 'when' AND
>                        traits.value >= '2013-03-13 13:51:22.761889' AND
>                        traits.value <= '2013-03-13 13:53:15.508622');
> 
> +----------------------------------+
> | traits_value                     |
> +----------------------------------+
> | ins-f501e7ca-df38-4fb0-97a4-470e |
>                ...
> | ins-ca4b7acb-04b0-4d32-9a8f-9e0e |
> +----------------------------------+
> 50 rows in set (4 min 27.78 sec)
> 
> OVER FOUR MINUTES. Ugh.
> (mysql 5.1.67 on Ubuntu 10.04, 4G ram, single core)
> 
> This is a pretty darn simple query.

There's a number of issues that degrade the performance of the above query.

1) Datetime data should not be in a VARCHAR(32) column. It should be in
either a DATETIME or a TIMESTAMP column (in MySQL at least... PostgreSQL
has similar temporal types). The difference in space per index record is
significant... 8 or 4 bytes vs. 26 bytes. Over millions of rows, that
makes a difference when performing index seeks and scans.

2) Because of the denormalized schema, there needs to be a subquery to
get the range of record IDs for a given time period. Although you could
convert the dependent subquery into a standard self-join, you could
accomplish the same query with a single pass over a normalized schema.

For instance, you could rewrite the above subquery as a standard join on
a derived table like so:

SELECT t.value
FROM traits t
JOIN (
 SELECT id FROM traits
 WHERE traits.`key` = 'when'
 AND traits.value >= '2013-03-13 13:51:22.761889'
 AND traits.value <= '2013-03-13 13:53:15.508622'
) as t2
ON t.id = t2.id
GROUP BY t.value;

That may be better-performing under MySQL 5.1. I suspect later versions
of MySQL's optimizer will rewrite the above automatically, but it's
worth checking it out by doing an EXPLAIN SELECT of the above two queries.

Regardless, if you had a normalized schema, you would only need a single
pass over the table, instead of two passes:

Assuming a normalized schema similar to this, with admittedly simplistic
example values:

CREATE TABLE event (
  id BIGINT UNSIGNED NOT NULL AUTOINCREMENT PRIMARY KEY
, created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
, event_type INT UNSIGNED NOT NULL
, value_int INT NULL
, value_float FLOAT NULL
, value_text TEXT NULL
, KEY (created_at, event_type)
);

CREATE TABLE event_type (
  id INT UNSIGNED NOT NULL AUTOINCREMENT PRIMARY KEY
, description VARCHAR(100) NOT NULL
);

INSERT INTO event_type (id, description)
VALUES (1, 'cpu_usage'), (2, 'volume_attached');

CREATE TABLE trait (
  id INT UNSIGNED NOT NULL AUTOINCREMENT PRIMARY KEY
, description VARCHAR(100) NOT NULL
);

INSERT INTO trait (id, description)
VALUES (1, 'Instance UUID'), (2, 'Tenant UUID');

CREATE TABLE event_trait (
  event BIGINT UNSIGNED NOT NULL,
  trait INT UNSIGNED NOT NULL,
  value VARCHAR(200) NOT NULL
);

With the above schema, you could accomplish the same query for the
instance UUIDs of events in a certain temporal range with a single pass
query:

SELECT et.value
FROM event e
JOIN event_trait et ON e.id = et.event
WHERE e.created_at
AND et.trait = 1
BETWEEN '2013-03-13 13:51:22'
AND '2013-03-13 13:53:15'
GROUP BY et.value;

Further, you could query further for tenant UUIDs involved in only
volume_attached events in the same event temporal range:

SELECT et.value
FROM event e
JOIN event_trait et ON e.id = et.event
WHERE e.created_at
AND e.event_type = 2
AND et.trait = 2
BETWEEN '2013-03-13 13:51:22'
AND '2013-03-13 13:53:15'
GROUP BY et.value;

I believe you will find the above schema much more efficient as well as
much more flexible for a greater number of query types.

> In StackTach we're already dealing with nearly 17M rows for about 5
> months of data (full event data in a fully denormalized table) and doing
> things a lot more complicated.

17M rows is not important. If you have 17M rows in a denormalized table
with wide rows (avg bytes per row in data and index pages), queries may
be much slower than 500M rows in a normalized schema where you fit many
more index records per page in memory.

When you show performance numbers or query plans, you should indicate
whether the data set size is greater than the amount of buffer pool size
(for InnoDB) or key buffer size (for MyISAM). It makes a big difference,
and the benefits of a normalized schema will be even greater when you
reach a data set size that exceeds the available memory for the database.

Best,
-jay

> Comparable queries on the same machine against the stacktach rawdata
> table are < 1s.
> 
> show columns from stacktach_rawdata;
> +---------------+--------------+------+-----+---------+----------------+
> | Field         | Type         | Null | Key | Default | Extra          |
> +---------------+--------------+------+-----+---------+----------------+
> | id            | int(11)      | NO   | PRI | NULL    | auto_increment |
> | deployment_id | int(11)      | NO   | MUL | NULL    |                |
> | tenant        | varchar(50)  | YES  | MUL | NULL    |                |
> | json          | longtext     | NO   |     | NULL    |                |
> | routing_key   | varchar(50)  | YES  | MUL | NULL    |                |
> | state         | varchar(20)  | YES  | MUL | NULL    |                |
> | old_state     | varchar(20)  | YES  | MUL | NULL    |                |
> | old_task      | varchar(30)  | YES  | MUL | NULL    |                |
> | when          | datetime     | NO   | MUL | NULL    |                |
> | microseconds  | int(11)      | NO   | MUL | NULL    |                |
> | publisher     | varchar(100) | YES  | MUL | NULL    |                |
> | event         | varchar(50)  | YES  | MUL | NULL    |                |
> | service       | varchar(50)  | YES  | MUL | NULL    |                |
> | host          | varchar(100) | YES  | MUL | NULL    |                |
> | instance      | varchar(50)  | YES  | MUL | NULL    |                |
> | request_id    | varchar(50)  | YES  | MUL | NULL    |                |
> +---------------+--------------+------+-----+---------+----------------+
> 16 rows in set (0.00 sec)
> 
> I think we need to pass on this normalized metadata approach unless
> someone can spot something blatant I'm doing wrong here?
> 
> We may have to build a system where we can have user-defined data types
> via plugins and each plugin will manage the tables it needs (with
> migration). More complicated, but certainly optimal.
> 
> The source for the test is here:
> https://github.com/SandyWalsh/twobillion/blob/master/sql_schema.py
> and
> https://github.com/SandyWalsh/twobillion/blob/master/schema.py
> 
> (uncomment the create_data() call on the first run to get a dataset
> https://github.com/SandyWalsh/twobillion/blob/master/sql_schema.py#L169 )
> 
> Look forward to your feedback db-heads :)
> 
> -S
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

Open Stack

[openstack-dev] [Ceilometer] Experiments with Metadata table for Meters ... initial results.

OpenStack

Community

Documentation

Branding & Legal