[openstack-dev] [nova] [gantt] Scheduler cleanup - what did we agree to

Sylvain Bauza sbauza at redhat.com
Mon Sep 8 08:33:50 UTC 2014


Hi Don,

Adding [nova] in the subject too, because we could miss some people here.


Le 08/09/2014 07:24, Dugger, Donald D a écrit :
>
> As I mentioned in a prior email I think that, although we’re in 
> agreement on what needs to be done before splitting out the scheduler 
> into the Gantt project, I believe we have different views on what that 
> agreement actually is.  Given that we have multiple people that 
> actively want to work on this split I would like to try and put down 
> the specifics of what needs to be accomplished.
>
> As I see it the top level issue is cleaning up the internal interfaces 
> between the Nova core code and the scheduler, specifically:
>
> 1)The client interface
>
> a.Done – we’ve created and pushed a patch to address this interface
>
+1. Scheduler-lib is now merged, but using JSON dicts to pass updates to 
the scheduler.
The main point of this blueprint is to create a new interface for 
updating stats to the Scheduler, as RT was previously directly sending 
DB modifications to the conductor (even not yet using objects)


> 2)Data-base access
>
> a.Ongoing – we’ve created a patch that missed the Juno deadline, try 
> again in Kilo
>

This isolate-scheduler-db blueprint was based on Extensible Resource 
Tracker (ERT). As ERT is not yet fully merged upstream (the scheduler 
part is still on review) and as it's not providing a clear interface for 
stats (just adding nested dicts to a big JSON string), we decided to 
review other opportunities for sending these updates necessary for 
having the filters looking at HostState instead of directly calling 
other Nova objects.


> 3)Resource Tracker
>
> a.Identify what data is sent from compute to scheduler
>
> b.Track that data inside the scheduler
>
> c.Not started yet (being discussed)
>

To be precise, we need to clearly define the interface that the 
Scheduler is exposing. As I said above, there is now another method 
called update_resource_stats(name, stats) which provided a first 
endpoint for sending updates to the Scheduler, but we need to strengthen 
this method by having validation and typing here instead of a blob.

On the other hand, we also need to make sure that the claiming mechanism 
is robust enough for supporting various kinds of claiming (the NUMA 
patches that were sent proved that there is room for improvement here). 
Ideally, the claiming system should be done on the Scheduler itself (by 
having a distributed transactional model for concurrent scheduling 
requests to multiple schedulers).


> These to me are the critical items for the split.  Yes there are lots 
> of other areas/interfaces inside Nova that should be cleaned up but 
> the goal here is to split out the scheduler, not to refactor every 
> interface inside Nova.
>

Indeed, we need to keep in track the objective to split the Scheduler as 
soon as possible. That's why I'm proposing a strategy of updating stats 
to the Scheduler by passing Nova objects (ComputeNode here) to the 
Scheduler using the update_resource_stats() method previously given and 
by adding the instance_claim and resize_claim methods to the ComputeNode 
object itself, so that a select_destinations() call from the conductor 
could issue a call to each ComputeNode it elects for making sure it 
would have enough resources for it.
The current RT claims would be kept for backwards compatibility purpose 
and doublechecking until we consider the new workflow good enough for 
removing these claims.

The above strategy is coming from a braindump but estimated as the 
lowest common denominator for all the necessary changes. I'm really 
concerned by any temptative of doing some big-bangs here which could 
leave us to loose the focus on splitting the scheduler.

> Feel free to correct this email but I really want to make sure we all 
> are in agreement on the same thing so that we can actually get 
> something done.
>

Yeah, I assume that's quite frustrating because the design phase is not 
yet ended. IMHO I think we need to find some sort of online meetup for 
discussing all the bits of the split, as we can't wait for the Kilo 
Summit to be here. The Gantt meetings are obviously not the right place 
for discussing design and implementation so we need to promote online 
tools for doing such work.

We need ideas, we need volunteers, so feel free to raise your hand (and 
your voice) if you reader, you're willing to work on that effort.

-Sylvain




> --
>
> Don Dugger
>
> "Censeo Toto nos in Kansa esse decisse." - D. Gale
>
> Ph: 303/443-3786
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140908/5fa8b27e/attachment.html>


More information about the OpenStack-dev mailing list