Clint Byrum clint at fewbar.com
Mon Mar 9 23:10:14 UTC 2015

Excerpts from Mike Bayer's message of 2015-03-09 10:26:37 -0700:
> Clint Byrum <clint at fewbar.com> wrote:
> > Excerpts from David Stanek's message of 2015-03-08 11:18:05 -0700:
> >> On Sun, Mar 8, 2015 at 1:37 PM, Mike Bayer <mbayer at redhat.com> wrote:
> >> 
> >>> can you elaborate on your reasoning that FK constraints should be used less
> >>> overall?  or do you just mean that the client side should be mirroring the
> >>> same
> >>> rules that would be enforced by the FKs?
> >> 
> >> I don't think he means that we will use them less.  Our SQL backends are
> >> full of them.  What Keystone can't do is rely on them because not all
> >> implementations of our backends support FKs.
> > 
> > Note that they're also a huge waste of SQL performance. It's _far_ cheaper
> > to scale out application servers and garbage-collect using background jobs
> > like pt-archiver than it will ever be to scale out a consistent data-store
> > and do every single little bit of house keeping in real time.  So even
> > on SQL backends, I'd recommend just disabling and dropping FK constraints
> > if you expect any more than the bare minimum usage of Keystone.
> Im about -1000 on disabling foreign key constraints. Any decision based on
> “performance” IMHO has to be proven with benchmarks. Foreign keys on modern
> databases like MySQL and Postgresql do not add overhead to any significant
> degree compared to just the workings of the Python code itself (which means,
> a benchmark here should be illustrating a tangible impact on the python
> application itself). OTOH, the prospect of a database with failed
> referential integrity is a recipe for disaster.   

So I think I didn't speak clearly enough here. The benchmarks are of
course needed, but there's a tipping point when write activity gets to
a certain level where it's cheaper to let it get a little skewed and
correct asynchronously. This is not unique to SQL, this is all large
scale distributed systems. There's probably a super cool formula for it
too, but roughly it is

(num_trans_per_s * cost_of_fk_check_per_trans)


(error_cost * error_rate)+(cost_find_all_errors/seconds_to_find_all_errors)

So it's not really something I think one can blindly accept as "better",
but rather something that one needs to calculate for themselves. You say
cost_of_fk_check_per_trans is negligible, but that has been measured as
not true in the past:


That article demonstrates that the FK adds lock contention in
InnoDB. There's more. With NDB (MySQL cluster) it's an 18% performance
hit on raw throughput:


Though that could be artificially inflated due to being a raw benchmark.

Now, where that point is with Keystone I don't know. The point is, if you
write the code relying on the existence, Keystone becomes a vertically
scaling app that cannot ever scale out beyond whatever that limit is.

More information about the OpenStack-dev mailing list