[openstack-dev] Scheduler proposal

Joshua Harlow harlowja at fastmail.com
Fri Oct 16 15:47:58 UTC 2015


Clint Byrum wrote:
> Excerpts from Ed Leafe's message of 2015-10-15 11:56:24 -0700:
>> Wow, I seem to have unleashed a bunch of pent-up frustration in the
>> community! It's great to see everyone coming forward with their
>> ideas and insights for improving the way Nova (and, by extension,
>> all of OpenStack) can potentially scale.
>>
>> I do have a few comments on the discussion:
>>
>> 1) This isn't a proposal to simply add some sort of DLM to Nova as
>> a magic cure-all. The concerns about Nova's ability to scale have
>> to do a lot more with the overall internal communication design.
>>
>
> In this, we agree.
>
>> 2) I really liked the comment about "made-up numbers". It's so
>> true: we are all impressed by such examples of speed that we
>> sometimes forget whether speeding up X will improve the overall
>> process to any significant degree. The purpose of my original email
>> back in July, and the question I asked at the Nova midcycle, is if
>> we could get some numbers that would be a target to shoot for with
>> any of these experiments. Sure, I could come up with a test that
>> shows a zillion transactions per second, but if that doesn't result
>> in a cloud being able to schedule more efficiently, what's the
>> point?
>>
>
> Speed is only 1 dimension. Efficiency and simplicity are two others
> that I think are harder to quantify, but are also equally important
> in any component of OpenStack.
>
>> 3) I like the idea of something like ZooKeeper, but my concern is
>> how to efficiently query the data. If, for example, we had records
>> for 100K compute nodes, would it be possible to do the equivalent
>> of "SELECT * FROM resources WHERE resource_type = 'compute' AND
>> free_ram_mb>= 2048 AND …" - well, you get the idea. Are complex
>> data queries possible in ZK? I haven't been able to find that
>> information anywhere.
>>
>
> You don't do complex queries, because you have all of the data in
> RAM, in an efficient in-RAM format. Even if each record is 50KB, we
> can do 100,000 of them in 5GB. That's a drop in the bucket.
>
>> 4) It is true that even in a very large deployment, it is possible
>> to keep all the relevant data needed for scheduling in memory. My
>> concern is how to efficiently search that data, much like in the ZK
>> scenario.
>>
>
> There are a bunch of ways to do this. My favorite is to have filter
> plugins in the scheduler define what they need to index, and then
> build a B-tree for each filter as each record arrives in the main
> data structure. When scheduling requests come in, they simply walk
> through each B-tree and turn that into a set. Then read each piece of
> the set out of the main structure and sort based on whichever you
> want (less full for load balancing, most full for efficient
> stacking).

Another idea is to use numpy and start representing filters as linear 
equations, then use something like 
https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve 
to solve linear equations given some data.

Another idea, turn each filter into a constraint equation (which it 
sorta is anyway) and use a known fast constraint solver on that data...

Lots of ideas here that can be possible, likely endless :)

>
>> 5) Concerns about Cassandra running with OpenJDK instead of the
>> Oracle JVM are troubling. I sent an email about this to one of the
>> people I know at DataStax, but so far have not received a response.
>> And while it would be great to have people contribute to OpenJDK to
>> make it compatible, keep in mind that that would be an ongoing
>> commitment, not just a one-time effort.
>>
>
> There are a few avenues to success with Cassandra but I don't think
> any of them pass very close to OpenStack's current neighborhood.
>
>> 6) I remember discussions back in the Austin-Bexar time frame about
>> what Thierry referred to as 'flavor-based schedulers', and they
>> were immediately discounted as not sophisticated enough to handle
>> the sort of complex scheduling requests that were expected. I'd be
>> interested in finding out from the big cloud providers what
>> percentage of their requests would fall into this simple structure,
>> and what percent are more complicated than that. Having hosts
>> listening to queues that they know they can satisfy removes the
>> raciness from the process, although it would require some
>> additional handling for the situation where no host accepts the
>> request. Still, it has the advantage of being dead simple.
>> Unfortunately, this would probably require a bigger architectural
>> change than integrating Cassandra into the Scheduler would.
>>
>
> No host accepting the request means your cloud is, more or less,
> full. If you have flavors that aren't proper factors of smaller
> flavors, this will indeed happen even when it isn't 100% utilized. If
> you have other constraints that you allow your users to specify, then
> you are letting them dictate how your hardware is utilized, which I
> think is a foolhardy business decision. This is no different than any
> other manufacturing batch size problem: sometimes parts of your
> process are under utilized, and you have to make choices about
> rejecting certain workloads if they will end up costing you more than
> you're willing to pay for the happy customer.
>
> Note that the "efficient stacking" model I talked about can't really
> work in the queue-based approach. If you want to fill up the most
> full hosts before filling more, you need some awareness of what host
> is most full and the compute nodes can't really know that.
>
>> I hope that those of us who will be at the Tokyo Summit and are
>> interested in these ideas can get together for an informal
>> discussion, and come up with some ideas for grand experiments and
>> reality checks. ;-)
>>
>> BTW, I started playing around with some ideas, and thought that if
>> anyone wanted to also try Cassandra, I'd write up a quick how-to
>> for setting up a small cluster:
>> http://blog.leafe.com/small-scale-cassandra/. Using docker images
>> makes it a breeze!
>>
>
> Really cool Ed. I agree, we need a barcamp just for scheduler ideas.
> :)
>
> __________________________________________________________________________
>
>
OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list