Open Stack

Mon Mar 9 23:10:14 UTC 2015

Excerpts from Mike Bayer's message of 2015-03-09 10:26:37 -0700:
> 
> Clint Byrum <clint at fewbar.com> wrote:
> 
> > Excerpts from David Stanek's message of 2015-03-08 11:18:05 -0700:
> >> On Sun, Mar 8, 2015 at 1:37 PM, Mike Bayer <mbayer at redhat.com> wrote:
> >> 
> >>> can you elaborate on your reasoning that FK constraints should be used less
> >>> overall?  or do you just mean that the client side should be mirroring the
> >>> same
> >>> rules that would be enforced by the FKs?
> >> 
> >> I don't think he means that we will use them less.  Our SQL backends are
> >> full of them.  What Keystone can't do is rely on them because not all
> >> implementations of our backends support FKs.
> > 
> > Note that they're also a huge waste of SQL performance. It's _far_ cheaper
> > to scale out application servers and garbage-collect using background jobs
> > like pt-archiver than it will ever be to scale out a consistent data-store
> > and do every single little bit of house keeping in real time.  So even
> > on SQL backends, I'd recommend just disabling and dropping FK constraints
> > if you expect any more than the bare minimum usage of Keystone.
> 
> Im about -1000 on disabling foreign key constraints. Any decision based on
> “performance” IMHO has to be proven with benchmarks. Foreign keys on modern
> databases like MySQL and Postgresql do not add overhead to any significant
> degree compared to just the workings of the Python code itself (which means,
> a benchmark here should be illustrating a tangible impact on the python
> application itself). OTOH, the prospect of a database with failed
> referential integrity is a recipe for disaster.   
> 

So I think I didn't speak clearly enough here. The benchmarks are of
course needed, but there's a tipping point when write activity gets to
a certain level where it's cheaper to let it get a little skewed and
correct asynchronously. This is not unique to SQL, this is all large
scale distributed systems. There's probably a super cool formula for it
too, but roughly it is

(num_trans_per_s * cost_of_fk_check_per_trans)

versus

(error_cost * error_rate)+(cost_find_all_errors/seconds_to_find_all_errors)

So it's not really something I think one can blindly accept as "better",
but rather something that one needs to calculate for themselves. You say
cost_of_fk_check_per_trans is negligible, but that has been measured as
not true in the past:

http://www.percona.com/blog/2010/09/20/instrumentation-and-the-cost-of-foreign-keys/

That article demonstrates that the FK adds lock contention in
InnoDB. There's more. With NDB (MySQL cluster) it's an 18% performance
hit on raw throughput:

http://johanandersson.blogspot.com/2013/06/benchmarking-performance-impact-of.html

Though that could be artificially inflated due to being a raw benchmark.

Now, where that point is with Keystone I don't know. The point is, if you
write the code relying on the existence, Keystone becomes a vertically
scaling app that cannot ever scale out beyond whatever that limit is.

Open Stack

[openstack-dev] [Keystone]ON DELETE RESTRICT VS ON DELETE CASCADE

OpenStack

Community

Documentation

Branding & Legal