[openstack-dev] Scheduler proposal

Chris Friesen chris.friesen at windriver.com
Wed Oct 7 23:00:10 UTC 2015

On 10/07/2015 11:36 AM, Ed Leafe wrote:

> I've finally gotten around to finishing writing up that proposal [1], and I'd
> like to hope that it would be the basis for future discussions about
> addressing some of the underlying issues that exist in OpenStack for
> historical reasons, and how we might rethink these choices today. I'd prefer
> comments and discussion here on the dev list, so that all can see your ideas,
> but I will be in Tokyo for the summit, and would also welcome some informal
> discussion there, too.
> -- Ed Leafe
>  [1] http://blog.leafe.com/reimagining_scheduler/

I've wondered for a while (ever since I looked at the scheduler code, really) 
why we couldn't implement more of the scheduler as database transactions.

I haven't used Cassandra, so maybe you can clarify something about updates 
across a distributed DB.  I just read up on lightweight transactions, and it 
says that they're restricted to a single partition.  Is that an acceptable 
limitation for this usage?

Some points that might warrant further discussion:

1) Some resources (RAM) only require tracking amounts.  Other resources (CPUs, 
PCI devices) require tracking allocation of specific individual host resources 
(for CPU pinning, PCI device allocation, etc.).  Presumably for the latter we 
would have to actually do the allocation of resources at the time of the 
scheduling operation in order to update the database with the claimed resources 
in a race-free way.

2) Are you suggesting that all of nova switch to Cassandra, or just the 
scheduler and resource tracking portions?  If the latter, how would we handle 
things like pinned CPUs and PCI devices that are currently associated with 
specific instances in the nova DB?

3) The concept of the compute node updating the DB when things change is really 
orthogonal to the new scheduling model.  The current scheduling model would 
benefit from that as well.

4) It seems to me that to avoid races we need to do one of the following.  Which 
are you proposing?
a) Serialize the entire scheduling operation so that only one instance can 
schedule at once.
b) Make the evaluation of filters and claiming of resources a single atomic DB 
c) Do a loop where we evaluate the filters, pick a destination, try to claim the 
resources in the DB, and retry the whole thing if the resources have already 
been claimed.


More information about the OpenStack-dev mailing list