<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On 25 February 2015 at 13:50, Eugene Nikanorov <span dir="ltr"><<a href="mailto:enikanorov@mirantis.com" target="_blank">enikanorov@mirantis.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks for putting this all together, Salvatore.<div><br></div><div>I just want to comment on this suggestion:</div><span class=""><div>> <span style="font-size:12.8000001907349px">1) Move the allocation logic out of the driver, thus making IPAM an independent service. The API workers will then communicate with the IPAM service through a message bus, where IP allocation requests will be "naturally serialized"</span></div><div><span style="font-size:12.8000001907349px"><br></span></div></span><div><span style="font-size:12.8000001907349px">Right now port creation is already a distributed process involving several parties. </span></div><div><span style="font-size:12.8000001907349px">Adding one more actor outside Neutron which can be communicated with message bus just to serialize requests makes me think of how terrible troubleshooting could be in case of applied load, when communication over mq slows down or interrupts.</span></div><div><span style="font-size:12.8000001907349px">Not to say such service would be SPoF and a contention point.</span></div><div><span style="font-size:12.8000001907349px">So, this of course could be an option, but personally I'd not like to see it as a default.</span></div></div></blockquote><div><br></div><div>Basically here I'm just braindumping. I have no idea on whether this could be scalable, reliable or maintainable (see reply to Clint's post). I wish I could prototype code for this, but I'm terribly slow. The days were I was able to produce thousands of working LOCs per day are long gone.</div><div><br></div><div>Anyway it is right that port creation is already a fairly complex workflow. However, IPAM will be anyway a synchronous operation within this workflow. Indeed if the IPAM process does not complete port wiring and securing in the agents cannot occur. So I do not expect it to add significant difficulties in troubleshooting, for which I might add that the issue is not really due to complex communication patterns, but to the fact that Neutron still does not have a decent mechanism to correlate events occurring on the server and in the agents, thus forcing developers and operators to read logs as if they were hieroglyphics. </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">Thanks,</span></div><div><span style="font-size:12.8000001907349px">Eugene.</span></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 25, 2015 at 4:35 AM, Robert Collins <span dir="ltr"><<a href="mailto:robertc@robertcollins.net" target="_blank">robertc@robertcollins.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 24 February 2015 at 01:07, Salvatore Orlando <<a href="mailto:sorlando@nicira.com" target="_blank">sorlando@nicira.com</a>> wrote:<br>

> Lazy-Stacker summary:<br>

...<br>

<span>> In the medium term, there are a few things we might consider for Neutron's<br>

> "built-in IPAM".<br>

> 1) Move the allocation logic out of the driver, thus making IPAM an<br>

> independent service. The API workers will then communicate with the IPAM<br>

> service through a message bus, where IP allocation requests will be<br>

> "naturally serialized"<br>

> 2) Use 3-party software as dogpile, zookeeper but even memcached to<br>

> implement distributed coordination. I have nothing against it, and I reckon<br>

> Neutron can only benefit for it (in case you're considering of arguing that<br>

> "it does not scale", please also provide solid arguments to support your<br>

> claim!). Nevertheless, I do believe API request processing should proceed<br>

> undisturbed as much as possible. If processing an API requests requires<br>

> distributed coordination among several components then it probably means<br>

> that an asynchronous paradigm is more suitable for that API request.<br>

<br>

</span>So data is great. It sounds like as long as we have an appropriate<br>

retry decorator in place, that write locks are better here, at least<br>

for up to 30 threads. But can we trust the data?<br>

<br>

One thing I'm not clear on is the SQL statement count.  You say 100<br>

queries for A-1 with a time on Galera of 0.06*1.2=0.072 seconds per<br>

allocation ? So is that 2 queries over 50 allocations over 20 threads?<br>

<br>

I'm not clear on what the request parameter in the test json files<br>

does, and AFAICT your threads each do one request each. As such I<br>

suspect that you may be seeing less concurrency - and thus contention<br>

- than real-world setups where APIs are deployed to run worker<br>

processes in separate processes and requests are coming in<br>

willy-nilly. The size of each algorithms workload is so small that its<br>

feasible to imagine the thread completing before the GIL bytecount<br>

code trigger (see<br>

<a href="https://docs.python.org/2/library/sys.html#sys.setcheckinterval" target="_blank">https://docs.python.org/2/library/sys.html#sys.setcheckinterval</a>) and<br>

the GIL's lack of fairness would exacerbate that.<br>

<br>

If I may suggest:<br>

 - use multiprocessing or some other worker-pool approach rather than threads<br>

 - or set setcheckinterval down low (e.g. to 20 or something)<br>

 - do multiple units of work (in separate transactions) within each<br>

worker, aim for e.g. 10 seconds or work or some such.<br>

 - log with enough detail that we can report on the actual concurrency<br>

achieved. E.g. log the time in us when each transaction starts and<br>

finishes, then we can assess how many concurrent requests were<br>

actually running.<br>

<br>

If the results are still the same - great, full steam ahead. If not,<br>

well lets revisit :)<br>

<br>

-Rob<br>

<span><font color="#888888"><br>

<br>

--<br>

Robert Collins <<a href="mailto:rbtcollins@hp.com" target="_blank">rbtcollins@hp.com</a>><br>

Distinguished Technologist<br>

HP Converged Cloud<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</font></span></blockquote></div><br></div>

</div></div><br>__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br></blockquote></div><br></div></div>