[openstack-dev] Scheduler proposal
chris.friesen at windriver.com
Wed Oct 7 23:00:10 UTC 2015
On 10/07/2015 11:36 AM, Ed Leafe wrote:
> I've finally gotten around to finishing writing up that proposal , and I'd
> like to hope that it would be the basis for future discussions about
> addressing some of the underlying issues that exist in OpenStack for
> historical reasons, and how we might rethink these choices today. I'd prefer
> comments and discussion here on the dev list, so that all can see your ideas,
> but I will be in Tokyo for the summit, and would also welcome some informal
> discussion there, too.
> -- Ed Leafe
>  http://blog.leafe.com/reimagining_scheduler/
I've wondered for a while (ever since I looked at the scheduler code, really)
why we couldn't implement more of the scheduler as database transactions.
I haven't used Cassandra, so maybe you can clarify something about updates
across a distributed DB. I just read up on lightweight transactions, and it
says that they're restricted to a single partition. Is that an acceptable
limitation for this usage?
Some points that might warrant further discussion:
1) Some resources (RAM) only require tracking amounts. Other resources (CPUs,
PCI devices) require tracking allocation of specific individual host resources
(for CPU pinning, PCI device allocation, etc.). Presumably for the latter we
would have to actually do the allocation of resources at the time of the
scheduling operation in order to update the database with the claimed resources
in a race-free way.
2) Are you suggesting that all of nova switch to Cassandra, or just the
scheduler and resource tracking portions? If the latter, how would we handle
things like pinned CPUs and PCI devices that are currently associated with
specific instances in the nova DB?
3) The concept of the compute node updating the DB when things change is really
orthogonal to the new scheduling model. The current scheduling model would
benefit from that as well.
4) It seems to me that to avoid races we need to do one of the following. Which
are you proposing?
a) Serialize the entire scheduling operation so that only one instance can
schedule at once.
b) Make the evaluation of filters and claiming of resources a single atomic DB
c) Do a loop where we evaluate the filters, pick a destination, try to claim the
resources in the DB, and retry the whole thing if the resources have already
More information about the OpenStack-dev