<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 12 October 2015 at 21:18, Clint Byrum <span dir="ltr"><<a href="mailto:clint@fewbar.com" target="_blank">clint@fewbar.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We _would_ keep a local cache of the information in the schedulers. The<br>

centralized copy of it is to free the schedulers from the complexity of<br>

having to keep track of it as state, rather than as a cache. We also don't<br>

have to provide a way for on-demand stat fetching to seed scheduler 0.<br></blockquote><div><br></div><div>I'm not sure that actually changes.  On restart of a scheduler, it wouldn't have enough knowledge to schedule, but the other schedulers are not and can service requests while it waits for data.  Using ZK, that takes fewer seconds because it can get a braindump, but during that window in either case the system works at n-1/n capacity assuming queries are only done in memory.<br><br></div><div>Also, you were seeming to tout the ZK option would take less memory, but it seems it would take more.  You can't schedule without a relatively complete set of information or some relatively intricate query language, which I didn't think ZK was up to (but I'm open to correction there, certainly).  That implies that when you notify a scheduler of a change to the data model, it's going to grab the fresh data and keep it locally.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class="">

> Also, the notification path here is that the compute host notifies ZK and<br>

> ZK notifies many schedulers, assuming they're all capable of handling all<br>

> queries.  That is in fact N * (M+1) messages, which is slightly more than<br>

> if there's no central node, as it happens.  There are fewer *channels*, but<br>

> more messages.  (I feel like I'm overlooking something here, but I can't<br>

> pick out the flaw...)  Yes, RMQ will suck at this - but then let's talk<br>

> about better messaging rather than another DB type.<br>

><br>

<br>

</span>You're calling transactions messages, and that's not really fair to<br>

messaging or transactions. :)<br></blockquote><div><br></div>I was actually talking about the number of messages crossing the network.  Your point is that the transaction with ZK is heavier weight than the update processing at the schedulers, I think.  But then removing ZK as a nexus removes that transaction, so both the number of messages and the number of transactions goes down.<br></div><div class="gmail_quote"><div></div><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> However, it's important to note that in<br>

this situation, compute nodes do not have to send anything anywhere if<br>

nothing has changed, which is very likely the case for "full" compute<br>

nodes, and certainly will save many many redundant messages.</blockquote><div><br></div><div>Now that's a fair comment, certainly, and would drastically reduce the number of messages in the system if we can keep the nodes from updating just because their free memory has changed by a couple of pages.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Forgive me<br>

if nova already makes this optimization somehow, it didn't seem to when<br>

I was tinkering a year ago.<br></blockquote><div><br></div><div>Not as far as I know, it doesn't.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

There is also the complexity of designing a scheduler which is fault<br>

tolerant and scales economically. What we have now will overtax the<br>

message bus and the database as the number of compute nodes increases.<br>

We want to get O(1) complexity out of that, but we're getting O(N)<br>

right now.<br></blockquote><div><br></div><div>O(N) will work providing O is small. ;)<br><br></div><div>I think our cost currently lies in doing 1 MySQL DB update per node per minute, and one really quite mad query per schedule.  I agree that ZK would be less costly for that in both respects, which is really more about lowering O than N.  I'm wondering if we can do better still, that's all, but we both agree that this approach would work.<br></div>-- <br><div>Ian.<br></div></div></div></div>