[openstack-dev] [Neutron] db-level locks, non-blocking algorithms, active/active DB clusters and IPAM
enikanorov at mirantis.com
Wed Feb 25 12:50:12 UTC 2015
Thanks for putting this all together, Salvatore.
I just want to comment on this suggestion:
> 1) Move the allocation logic out of the driver, thus making IPAM an
independent service. The API workers will then communicate with the IPAM
service through a message bus, where IP allocation requests will be
Right now port creation is already a distributed process involving several
Adding one more actor outside Neutron which can be communicated with
message bus just to serialize requests makes me think of how terrible
troubleshooting could be in case of applied load, when communication over
mq slows down or interrupts.
Not to say such service would be SPoF and a contention point.
So, this of course could be an option, but personally I'd not like to see
it as a default.
On Wed, Feb 25, 2015 at 4:35 AM, Robert Collins <robertc at robertcollins.net>
> On 24 February 2015 at 01:07, Salvatore Orlando <sorlando at nicira.com>
> > Lazy-Stacker summary:
> > In the medium term, there are a few things we might consider for
> > "built-in IPAM".
> > 1) Move the allocation logic out of the driver, thus making IPAM an
> > independent service. The API workers will then communicate with the IPAM
> > service through a message bus, where IP allocation requests will be
> > "naturally serialized"
> > 2) Use 3-party software as dogpile, zookeeper but even memcached to
> > implement distributed coordination. I have nothing against it, and I
> > Neutron can only benefit for it (in case you're considering of arguing
> > "it does not scale", please also provide solid arguments to support your
> > claim!). Nevertheless, I do believe API request processing should proceed
> > undisturbed as much as possible. If processing an API requests requires
> > distributed coordination among several components then it probably means
> > that an asynchronous paradigm is more suitable for that API request.
> So data is great. It sounds like as long as we have an appropriate
> retry decorator in place, that write locks are better here, at least
> for up to 30 threads. But can we trust the data?
> One thing I'm not clear on is the SQL statement count. You say 100
> queries for A-1 with a time on Galera of 0.06*1.2=0.072 seconds per
> allocation ? So is that 2 queries over 50 allocations over 20 threads?
> I'm not clear on what the request parameter in the test json files
> does, and AFAICT your threads each do one request each. As such I
> suspect that you may be seeing less concurrency - and thus contention
> - than real-world setups where APIs are deployed to run worker
> processes in separate processes and requests are coming in
> willy-nilly. The size of each algorithms workload is so small that its
> feasible to imagine the thread completing before the GIL bytecount
> code trigger (see
> https://docs.python.org/2/library/sys.html#sys.setcheckinterval) and
> the GIL's lack of fairness would exacerbate that.
> If I may suggest:
> - use multiprocessing or some other worker-pool approach rather than
> - or set setcheckinterval down low (e.g. to 20 or something)
> - do multiple units of work (in separate transactions) within each
> worker, aim for e.g. 10 seconds or work or some such.
> - log with enough detail that we can report on the actual concurrency
> achieved. E.g. log the time in us when each transaction starts and
> finishes, then we can assess how many concurrent requests were
> actually running.
> If the results are still the same - great, full steam ahead. If not,
> well lets revisit :)
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev