<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Nov 18, 2013 at 12:14 PM, Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im">On 11/18/2013 02:35 PM, Mike Spreitzer wrote:<br>


<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

There were some concerns expressed at the summit about scheduler<br>

scalability in Nova, and a little recollection of Boris' proposal to<br>

 keep the needed state in memory.<br>

</blockquote>

<br></div>

While it could be possible to do all of the scheduler state in memory, I<br>

think a better (or at least, less cumbersome initially) approach would<br>

be to add some layers of in-memory caching to any existing parts where<br>

the scheduler currently makes a database query. The problem with this is<br></blockquote><div><br></div><div>Phil Day discussed this at the summit and I have finally gotten around to posting a POC of this.</div><div><br>


</div><div><a href="https://review.openstack.org/#/c/57053/1/nova/scheduler/host_manager.py">https://review.openstack.org/#/c/57053/</a><br></div><div><br></div><div>It is very very rough, but gives the general idea. Small scale testing in devstack showed promising initial results.</div>


<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

that you won't be able to scale out the design -- since the scheduler's<br>

cached pieces cannot be shared easily across distributed nodes. This is<br>

where the concept of using cells and a hierarchical "sieve scheduling"<br>

pattern is used, where higher-level cell schedulers can quickly send a<br>

scheduling request to another cell's scheduler based on a small amount<br>

of information that can generally be compared against in-memory things<br>

(like region, availability zone, type of hypervisor, etc...)<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

I also heard one guy say that he thinks Nova does not really need a<br>

general SQL database, that a NOSQL database with a bit of<br>

denormalization and/or client-maintained secondary indices could<br>

suffice.  Has that sort of thing been considered before?  What is the<br>

community's level of interest in exploring that?<br>

</blockquote>

<br></div>

Good luck. :)  I don't think that whomever suggested that a NoSQL<br>

database with a "bit of denormalization" would suffice for Nova realized<br>

the extent to which the sets of data within Nova's database are highly<br>

relational. You will just end up implementing JOIN algorithms in Python<br>

code and make some of the more advanced search queries much slower, IMO.<br>

<br>

Oh, and BTW, Nova's "database" was originally Redis [1] :)<br>

<br>

Best,<br>

-jay<br>

<br>

[1]<br>

<a href="https://github.com/openstack/nova/blob/bf6e6e718cdc7488e2da87b21e258ccc065fe499/nova/datastore.py" target="_blank">https://github.com/openstack/<u></u>nova/blob/<u></u>bf6e6e718cdc7488e2da87b21e258c<u></u>cc065fe499/nova/datastore.py</a><div class="">


<div class="h5"><br>

<br>

______________________________<u></u>_________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.<u></u>org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/<u></u>cgi-bin/mailman/listinfo/<u></u>openstack-dev</a><br>

</div></div></blockquote></div><br></div></div>