[openstack-dev] Scheduler proposal

Alec Hothan (ahothan) ahothan at cisco.com
Fri Oct 9 21:34:25 UTC 2015

There are several ways to make python code that deals with a lot of data faster, especially when it comes to operating on DB fields from SQL tables (and that is not limited to the nova scheduler).
Pulling data from large SQL tables and operating on them through regular python code (using python loops) is extremely inefficient due to the nature of the python interpreter. If this is what nova scheduler code is doing today, the good thing is there is a potentially huge room for improvement.

The approach to scale out, in practice means a few instances (3 instances is common), meaning the gain would be in the order of 3x (or 1 order of magnitude) but with sharply increased complexity to deal with concurrent schedulers and potentially conflicting results (with the use of tools lie ZK or Consul...). But in essence we're basically just running the same unoptimized code concurrently to achieve a better throughput.
On the other hand optimizing something that is not very optimized to start with can yield a much better return than 3x, with the advantage of simplicity (one active scheduler, which could be backed by a standby for HA).

Python is actually one of the better languages to do *fast* in-memory big data processing using open source python scientific and data analysis libraries as they can provide native speed through cythonized libraries and powerful high level abstraction to do complex filters and vectorized operations. Not only it is fast but it also yields much smaller code.

I have used libraries such as numpy and pandas to operate on very large data sets (the equivalent of SQL tables with hundreds of thousands of rows) and there is easily 2 orders of magnitude of difference for operating on these data in memory between plain python code with loops and python code using these libraries (that is without any DB access).
The order of filtering on the kind of reduction that you describe below certainly helps but becomes second order when you use pandas filters because they are extremely fast even for very large datasets.

I'm curious to know why this path was not explored more before embarking full speed on concurrency/scale out options which is a very complex and treacherous path as we see in this discussion. Clearly very attractive intellectually to work with all these complex distributed frameworks, but the cost of complexity is often overlooked.

Is there any data showing the performance of the current nova scheduler? How many scheduling can nova do per second at scale with worst case filters?
When you think about it, 10,000 nodes and their associated properties is not such a big number if you use the right libraries.

On 10/9/15, 1:10 PM, "Joshua Harlow" <harlowja at fastmail.com> wrote:

>And also we should probably deprecate/not recommend:
>That filter IMHO basically disallows optimizations like forming SQL 
>statements for each filter (and then letting the DB do the heavy 
>lifting) or say having each filter say 'oh my logic can be performed by 
>a prepared statement ABC and u should just use that instead' (and then 
>letting the DB do the heavy lifting).
>Chris Friesen wrote:
>> On 10/09/2015 12:25 PM, Alec Hothan (ahothan) wrote:
>>> Still the point from Chris is valid. I guess the main reason openstack is
>>> going with multiple concurrent schedulers is to scale out by
>>> distributing the
>>> load between multiple instances of schedulers because 1 instance is too
>>> slow. This discussion is about coordinating the many instances of
>>> schedulers
>>> in a way that works and this is actually a difficult problem and will get
>>> worst as the number of variables for instance placement increases (for
>>> example NFV is going to require a lot more than just cpu pinning, huge
>>> pages
>>> and numa).
>>> Has anybody looked at why 1 instance is too slow and what it would
>>> take to
>>> make 1 scheduler instance work fast enough? This does not preclude the
>>> use of
>>> concurrency for finer grain tasks in the background.
>> Currently we pull data on all (!) of the compute nodes out of the
>> database via a series of RPC calls, then evaluate the various filters in
>> python code.
>> I suspect it'd be a lot quicker if each filter was a DB query.
>> Also, ideally we'd want to query for the most "strict" criteria first,
>> to reduce the total number of comparisons. For example, if you want to
>> implement the "affinity" server group policy, you only need to test a
>> single host. If you're matching against host aggregate metadata, you
>> only need to test against hosts in matching aggregates.
>> Chris
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe

More information about the OpenStack-dev mailing list