[openstack-dev] [Solum] Solum database schema modification proposal

Jay Pipes jaypipes at gmail.com
Mon Feb 3 17:39:04 UTC 2014


On Mon, 2014-02-03 at 16:22 +0000, Paul Montgomery wrote:
> Solum community,
> 
> I notice that we are using String(36) UUID values in the database
> schema as primary key for many new tables that we are creating.  For
> example:
> https://review.openstack.org/#/c/68328/10/solum/objects/sqlalchemy/application.py
> 
> Proposal: Add an int or bigint ID as the primary key, instead of UUID
> (the UUID field remains if needed), to improve database efficiency.

-1 for this particular case.

Using auto-incrementing primary keys can be beneficial in many use cases
-- particularly when trying to create a "hot spot" on disk for tables
with very high write to read ratios, like logging-type tables. 

However, autoincrementing primary keys come with some baggage when used
in large distributed database systems that UUIDs don't come with.
Namely, if you run Solum in multiple deployment zones or cells, you will
have primary key collision if you try to aggregate those databases into,
say, a data warehouse. With UUID primary keys, you won't have that
trouble.

In addition, for InnoDB tables in MySQL (as well as PostgreSQL and MS
SQL Server), the choice of primary key is critical, as it determines the
order by which the clustered index-organized tables are written to disk.
If the data you are looking up is accessed by the primary key, it will
be faster to store the records on disk in that order. Since you are not
advocating exposing the autoincrementing primary key to the user, the
database query for a record would need to do one non-clustered index
lookup into the index on UUID to find the autoincrementing primary key
value of the record in question, and then retrieve the record from the
clustered index on disk (or in the InnoDB buffer pool, which is also
ordered by primary key [1]). Two seek operations, versus only one if the
UUID is used as a primary key.

Again, autoincrementing are useful in many scenarios. But in this
particular case, I don't believe there would be a whole lot of value.

Best,
-jay

[1] Technically the InnoDB buffer pool contain unordered records within
each 16KB page, and a small ordered PK to slot number catalog at the
tail end of each data page, but the effect is the same.




More information about the OpenStack-dev mailing list