<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Nov 18, 2013 at 12:14 PM, Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">On 11/18/2013 02:35 PM, Mike Spreitzer wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
There were some concerns expressed at the summit about scheduler<br>
scalability in Nova, and a little recollection of Boris' proposal to<br>
keep the needed state in memory.<br>
</blockquote>
<br></div>
While it could be possible to do all of the scheduler state in memory, I<br>
think a better (or at least, less cumbersome initially) approach would<br>
be to add some layers of in-memory caching to any existing parts where<br>
the scheduler currently makes a database query. The problem with this is<br></blockquote><div><br></div><div>Phil Day discussed this at the summit and I have finally gotten around to posting a POC of this.</div><div><br>
</div><div><a href="https://review.openstack.org/#/c/57053/1/nova/scheduler/host_manager.py">https://review.openstack.org/#/c/57053/</a><br></div><div><br></div><div>It is very very rough, but gives the general idea. Small scale testing in devstack showed promising initial results.</div>
<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
that you won't be able to scale out the design -- since the scheduler's<br>
cached pieces cannot be shared easily across distributed nodes. This is<br>
where the concept of using cells and a hierarchical "sieve scheduling"<br>
pattern is used, where higher-level cell schedulers can quickly send a<br>
scheduling request to another cell's scheduler based on a small amount<br>
of information that can generally be compared against in-memory things<br>
(like region, availability zone, type of hypervisor, etc...)<div class="im"><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
I also heard one guy say that he thinks Nova does not really need a<br>
general SQL database, that a NOSQL database with a bit of<br>
denormalization and/or client-maintained secondary indices could<br>
suffice. Has that sort of thing been considered before? What is the<br>
community's level of interest in exploring that?<br>
</blockquote>
<br></div>
Good luck. :) I don't think that whomever suggested that a NoSQL<br>
database with a "bit of denormalization" would suffice for Nova realized<br>
the extent to which the sets of data within Nova's database are highly<br>
relational. You will just end up implementing JOIN algorithms in Python<br>
code and make some of the more advanced search queries much slower, IMO.<br>
<br>
Oh, and BTW, Nova's "database" was originally Redis [1] :)<br>
<br>
Best,<br>
-jay<br>
<br>
[1]<br>
<a href="https://github.com/openstack/nova/blob/bf6e6e718cdc7488e2da87b21e258ccc065fe499/nova/datastore.py" target="_blank">https://github.com/openstack/<u></u>nova/blob/<u></u>bf6e6e718cdc7488e2da87b21e258c<u></u>cc065fe499/nova/datastore.py</a><div class="">
<div class="h5"><br>
<br>
______________________________<u></u>_________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.<u></u>org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/<u></u>cgi-bin/mailman/listinfo/<u></u>openstack-dev</a><br>
</div></div></blockquote></div><br></div></div>