<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 12 October 2015 at 21:18, Clint Byrum <span dir="ltr"><<a href="mailto:clint@fewbar.com" target="_blank">clint@fewbar.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We _would_ keep a local cache of the information in the schedulers. The<br>
centralized copy of it is to free the schedulers from the complexity of<br>
having to keep track of it as state, rather than as a cache. We also don't<br>
have to provide a way for on-demand stat fetching to seed scheduler 0.<br></blockquote><div><br></div><div>I'm not sure that actually changes. On restart of a scheduler, it wouldn't have enough knowledge to schedule, but the other schedulers are not and can service requests while it waits for data. Using ZK, that takes fewer seconds because it can get a braindump, but during that window in either case the system works at n-1/n capacity assuming queries are only done in memory.<br><br></div><div>Also, you were seeming to tout the ZK option would take less memory, but it seems it would take more. You can't schedule without a relatively complete set of information or some relatively intricate query language, which I didn't think ZK was up to (but I'm open to correction there, certainly). That implies that when you notify a scheduler of a change to the data model, it's going to grab the fresh data and keep it locally.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span class="">
> Also, the notification path here is that the compute host notifies ZK and<br>
> ZK notifies many schedulers, assuming they're all capable of handling all<br>
> queries. That is in fact N * (M+1) messages, which is slightly more than<br>
> if there's no central node, as it happens. There are fewer *channels*, but<br>
> more messages. (I feel like I'm overlooking something here, but I can't<br>
> pick out the flaw...) Yes, RMQ will suck at this - but then let's talk<br>
> about better messaging rather than another DB type.<br>
><br>
<br>
</span>You're calling transactions messages, and that's not really fair to<br>
messaging or transactions. :)<br></blockquote><div><br></div>I was actually talking about the number of messages crossing the network. Your point is that the transaction with ZK is heavier weight than the update processing at the schedulers, I think. But then removing ZK as a nexus removes that transaction, so both the number of messages and the number of transactions goes down.<br></div><div class="gmail_quote"><div></div><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> However, it's important to note that in<br>
this situation, compute nodes do not have to send anything anywhere if<br>
nothing has changed, which is very likely the case for "full" compute<br>
nodes, and certainly will save many many redundant messages.</blockquote><div><br></div><div>Now that's a fair comment, certainly, and would drastically reduce the number of messages in the system if we can keep the nodes from updating just because their free memory has changed by a couple of pages.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Forgive me<br>
if nova already makes this optimization somehow, it didn't seem to when<br>
I was tinkering a year ago.<br></blockquote><div><br></div><div>Not as far as I know, it doesn't.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
There is also the complexity of designing a scheduler which is fault<br>
tolerant and scales economically. What we have now will overtax the<br>
message bus and the database as the number of compute nodes increases.<br>
We want to get O(1) complexity out of that, but we're getting O(N)<br>
right now.<br></blockquote><div><br></div><div>O(N) will work providing O is small. ;)<br><br></div><div>I think our cost currently lies in doing 1 MySQL DB update per node per minute, and one really quite mad query per schedule. I agree that ZK would be less costly for that in both respects, which is really more about lowering O than N. I'm wondering if we can do better still, that's all, but we both agree that this approach would work.<br></div>-- <br><div>Ian.<br></div></div></div></div>