[openstack-dev] [Neutron] db-level locks, non-blocking algorithms, active/active DB clusters and IPAM

Robert Collins robertc at robertcollins.net
Wed Feb 25 01:35:27 UTC 2015

On 24 February 2015 at 01:07, Salvatore Orlando <sorlando at nicira.com> wrote:
> Lazy-Stacker summary:
> In the medium term, there are a few things we might consider for Neutron's
> "built-in IPAM".
> 1) Move the allocation logic out of the driver, thus making IPAM an
> independent service. The API workers will then communicate with the IPAM
> service through a message bus, where IP allocation requests will be
> "naturally serialized"
> 2) Use 3-party software as dogpile, zookeeper but even memcached to
> implement distributed coordination. I have nothing against it, and I reckon
> Neutron can only benefit for it (in case you're considering of arguing that
> "it does not scale", please also provide solid arguments to support your
> claim!). Nevertheless, I do believe API request processing should proceed
> undisturbed as much as possible. If processing an API requests requires
> distributed coordination among several components then it probably means
> that an asynchronous paradigm is more suitable for that API request.

So data is great. It sounds like as long as we have an appropriate
retry decorator in place, that write locks are better here, at least
for up to 30 threads. But can we trust the data?

One thing I'm not clear on is the SQL statement count.  You say 100
queries for A-1 with a time on Galera of 0.06*1.2=0.072 seconds per
allocation ? So is that 2 queries over 50 allocations over 20 threads?

I'm not clear on what the request parameter in the test json files
does, and AFAICT your threads each do one request each. As such I
suspect that you may be seeing less concurrency - and thus contention
- than real-world setups where APIs are deployed to run worker
processes in separate processes and requests are coming in
willy-nilly. The size of each algorithms workload is so small that its
feasible to imagine the thread completing before the GIL bytecount
code trigger (see
https://docs.python.org/2/library/sys.html#sys.setcheckinterval) and
the GIL's lack of fairness would exacerbate that.

If I may suggest:
 - use multiprocessing or some other worker-pool approach rather than threads
 - or set setcheckinterval down low (e.g. to 20 or something)
 - do multiple units of work (in separate transactions) within each
worker, aim for e.g. 10 seconds or work or some such.
 - log with enough detail that we can report on the actual concurrency
achieved. E.g. log the time in us when each transaction starts and
finishes, then we can assess how many concurrent requests were
actually running.

If the results are still the same - great, full steam ahead. If not,
well lets revisit :)


Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

More information about the OpenStack-dev mailing list