[openstack-dev] [Keystone]ON DELETE RESTRICT VS ON DELETE CASCADE
Clint Byrum
clint at fewbar.com
Mon Mar 9 23:10:14 UTC 2015
Excerpts from Mike Bayer's message of 2015-03-09 10:26:37 -0700:
>
> Clint Byrum <clint at fewbar.com> wrote:
>
> > Excerpts from David Stanek's message of 2015-03-08 11:18:05 -0700:
> >> On Sun, Mar 8, 2015 at 1:37 PM, Mike Bayer <mbayer at redhat.com> wrote:
> >>
> >>> can you elaborate on your reasoning that FK constraints should be used less
> >>> overall? or do you just mean that the client side should be mirroring the
> >>> same
> >>> rules that would be enforced by the FKs?
> >>
> >> I don't think he means that we will use them less. Our SQL backends are
> >> full of them. What Keystone can't do is rely on them because not all
> >> implementations of our backends support FKs.
> >
> > Note that they're also a huge waste of SQL performance. It's _far_ cheaper
> > to scale out application servers and garbage-collect using background jobs
> > like pt-archiver than it will ever be to scale out a consistent data-store
> > and do every single little bit of house keeping in real time. So even
> > on SQL backends, I'd recommend just disabling and dropping FK constraints
> > if you expect any more than the bare minimum usage of Keystone.
>
> Im about -1000 on disabling foreign key constraints. Any decision based on
> “performance” IMHO has to be proven with benchmarks. Foreign keys on modern
> databases like MySQL and Postgresql do not add overhead to any significant
> degree compared to just the workings of the Python code itself (which means,
> a benchmark here should be illustrating a tangible impact on the python
> application itself). OTOH, the prospect of a database with failed
> referential integrity is a recipe for disaster.
>
So I think I didn't speak clearly enough here. The benchmarks are of
course needed, but there's a tipping point when write activity gets to
a certain level where it's cheaper to let it get a little skewed and
correct asynchronously. This is not unique to SQL, this is all large
scale distributed systems. There's probably a super cool formula for it
too, but roughly it is
(num_trans_per_s * cost_of_fk_check_per_trans)
versus
(error_cost * error_rate)+(cost_find_all_errors/seconds_to_find_all_errors)
So it's not really something I think one can blindly accept as "better",
but rather something that one needs to calculate for themselves. You say
cost_of_fk_check_per_trans is negligible, but that has been measured as
not true in the past:
http://www.percona.com/blog/2010/09/20/instrumentation-and-the-cost-of-foreign-keys/
That article demonstrates that the FK adds lock contention in
InnoDB. There's more. With NDB (MySQL cluster) it's an 18% performance
hit on raw throughput:
http://johanandersson.blogspot.com/2013/06/benchmarking-performance-impact-of.html
Though that could be artificially inflated due to being a raw benchmark.
Now, where that point is with Keystone I don't know. The point is, if you
write the code relying on the existence, Keystone becomes a vertically
scaling app that cannot ever scale out beyond whatever that limit is.
More information about the OpenStack-dev
mailing list