[openstack-dev] [nova][scheduler][placement] Trying to understand the proposed direction

Edward Leafe ed at leafe.com
Mon Jun 19 13:04:17 UTC 2017


There is a lot going on lately in placement-land, and some of the changes being proposed are complex enough that it is difficult to understand what the final result is supposed to look like. I have documented my understanding of the current way that the placement/scheduler interaction works, and also what I understand if how it will work when the proposed changes are all implemented. I don’t know how close that understanding is to what the design is, so I’m hoping that this will serve as a starting point for clarifying things, so that everyone involved in these efforts has a clear view of the target we are aiming for. So please reply to this thread with any corrections or additions, so that all can see.

I do realize that some of this is to be done in Pike, and the rest in Queens, but that timetable is not relevant to the overall understanding of the design.

-- Ed Leafe

Current flow:
* Scheduler gets a req spec from conductor, containing resource requirements
* Scheduler sends those requirements to placement
* Placement runs a query to determine the root RPs that can satisfy those requirements
* Placement returns a list of the UUIDs for those root providers to scheduler
* Scheduler uses those UUIDs to create HostState objects for each
* Scheduler runs those HostState objects through filters to remove those that don't meet requirements not selected for by placement
* Scheduler runs the remaining HostState objects through weighers to order them in terms of best fit.
* Scheduler takes the host at the top of that ranked list, and tries to claim the resources in placement. If that fails, there is a race, so that HostState is discarded, and the next is selected. This is repeated until the claim succeeds.
* Scheduler then creates a list of N UUIDs, with the first being the selected host, and the the rest being alternates consisting of the next hosts in the ranked list that are in the same cell as the selected host.
* Scheduler returns that list to conductor.
* Conductor determines the cell of the selected host, and sends that list to the target cell.
* Target cell tries to build the instance on the selected host. If it fails, it unclaims the resources for the selected host, and tries to claim the resources for the next host in the list. It then tries to build the instance on the next host in the list of alternates. Only when all alternates fail does the build request fail.

Proposed flow:
* Scheduler gets a req spec from conductor, containing resource requirements
* Scheduler sends those requirements to placement
* Placement runs a query to determine the root RPs that can satisfy those requirements
* Placement then constructs a data structure for each root provider as documented in the spec. [0]
* Placement returns a number of these data structures as JSON blobs. Due to the size of the data, a page size will have to be determined, and placement will have to either maintain that list of structured datafor subsequent requests, or re-run the query and only calculate the data structures for the hosts that fit in the requested page.
* Scheduler continues to request the paged results until it has them all.
* Scheduler then runs this data through the filters and weighers. No HostState objects are required, as the data structures will contain all the information that scheduler will need.
* Scheduler then selects the data structure at the top of the ranked list. Inside that structure is a dict of the allocation data that scheduler will need to claim the resources on the selected host. If the claim fails, the next data structure in the list is chosen, and repeated until a claim succeeds.
* Scheduler then creates a list of N of these data structures, with the first being the data for the selected host, and the the rest being data structures representing alternates consisting of the next hosts in the ranked list that are in the same cell as the selected host.
* Scheduler returns that list to conductor.
* Conductor determines the cell of the selected host, and sends that list to the target cell.
* Target cell tries to build the instance on the selected host. If it fails, it uses the allocation data in the data structure to unclaim the resources for the selected host, and tries to claim the resources for the next host in the list using its allocation data. It then tries to build the instance on the next host in the list of alternates. Only when all alternates fail does the build request fail.


[0] https://review.openstack.org/#/c/471927/







More information about the OpenStack-dev mailing list