[openstack-dev] [nova] FKs in the DB

Mike Bayer mbayer at redhat.com
Sun Nov 22 19:59:06 UTC 2015



On 11/20/2015 04:33 PM, Clint Byrum wrote:
> Excerpts from Mike Bayer's message of 2015-11-20 11:29:31 -0800:
>>
>> On 11/20/2015 11:19 AM, Alexis Lee wrote:
>>> We just had a fun discussion in IRC about whether foreign keys are evil.
>>> Initially I thought this was crazy but mordred made some good points. To
>>> paraphrase, that if you have a scale-out app already it's easier to
>>> manage integrity in your app than scale-out your persistence layer.
>>
>> I've had this argument with mordred before, and it seems again there's
>> the same misunderstanding going on:
>>
>> 1. Your application can have **conceptual** foreign keys in it, without
>> actually having foreign keys **for real** in the database.  This means
>> your SQLAlchemy code still does ForeignKey, ForeignKeyConstraint, and
>> most importantly your **database still uses normal form**, that is, any
>> row that refers to another does it based on a set of columns that
>> exactly match to the primary key of a single table elsewhere (not to
>> multiple tables, not to a function of the columns cast from int to
>> string and concatenated to the value in the other table etc, an *exact
>> match*).   I'm sure that mordred agrees with all of these practices,
>> however when one says "we aren't going to use foreign keys anymore",
>> typically it is all these critical schema design practices that go out
>> the window.  Put another way, the foreign key concept not only
>> constrains data in a real database, just the concept of them constraints
>> the **developer** to use correct normal form.
>>
> 
> Mike, thanks for making that clarification. I agree, that conceptual
> FK's are not the same as FK constraints in the DB. Joins are not demons.
> :)
> 
> To be clear, while what you say above is all true, normal form isn't
> actually a goal of any system. It's a tactic one can use with a well known
> efficiency profile. But there are times when it costs more than other
> more brutal, less civilized methods of database design and usage. If we
> don't measure our efficiency, we won't actually know if this is one of
> those times or not.


Hi Clint -

To be clear to be clear (  :)  ), I am absolutely fine with:

1. disabling particular foreign key constraints on the server that are
shown to be a performance bottleneck

2. using denormalized forms where appropriate


My only point is, the *starting position* for any new database construct
should be favoring reasonable normalized forms and foreign key
constraints, with disabled constraints and denormalized forms the
exceptions, when shown to be appropriate.

What I don't want is, "we don't use foreign keys, they suck", and you
then see the growth of all the wacky unmaintainable schema designs we
see all the time in the PHP world and to some degree the ROR world -
straight up incorrect designs, which as I mentioned earlier, include
things like a series of columns related to another one only via some
series of functions like string concatenations and casts, so-called
"polymorphic foreign keys" where a particular column matches to that of
a row in any one of several other tables, etc.   My userbase in the
SQLAlchemy world is always coming to me with, "I have this terrible
legacy schema design where...." and then they have something like this
going on which they're struggling to work around.

Perhaps my point is, if we make a blanket "we don't use foreign keys"
statement, then you just lift one of the last beachheads we have against
folks who just don't know what they're doing implementing unmaintainable
database designs.   Hence I think the FK concept should remain important
and central when we work with relational schema design in Openstack,
subject to exceptions as appropriate on a case-by-case basis.



> 
>> 2. Here's the part mordred doesn't like - the FK is actually in the
>> database for real.   This is because they slow down inserts, updates,
>> and deletes, because they must be checked.   To which I say, no such
>> performance issue has been observed or documented in Openstack, we
>> aren't a 1 million TPS financial system, so this is vastly premature
>> optimization.
>>
> 
> I agree with you that this is unmeasured. I don't agree that we are not a
> 1 million TPS financial system, because the goal of running a cloud for
> many is, in fact, to make money. So while we may not have an example of
> a cloud running at 1 million TPS, it's not something we should dismiss
> too quickly.
> 
> That said, the measurement should come first. What I'd like to show
> is how many TPS we do actually do on boots, deletes, etc. etc. I'm
> working on it now, and I'd encourage people to join the effort on the
> counter-inspection QA spec if they want to get started measuring things.
> We're taking baby steps right now, but eventually I see us producing a
> lot of data that should be helpful in answering some of these questions.
> 
>> Also as far as #2, customers and operators *regularly* run scripts and
>> queries to modify openstack databases, particularly to delete soft
>> deleted rows.  These get blocked *all the time* by foreign key
>> constraints.  They are doing their job, and they are needed as a final
>> guard against data integrity issues.  We of course handle referential
>> integrity in the application layer as well via SQLAlchemy ORM constructs.
>>
> 
> Nobody ever doubts that there are times where database-side FK constraints
> help prevent costly mistakes. The question is always: at what cost? Right
> now, I'd say we don't really know because we're not measuring. That's
> fine, if you want to mitigate risk, one strategy is to buy insurance,
> and that's what the FK's in the DB are: insurance.
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list