[Openstack-operators] [nova] Queens PTG recap - cells

Matt Riedemann mriedemos at gmail.com
Sat Sep 16 16:37:43 UTC 2017


The full etherpad for cells discussions at the PTG is here [1].

We mostly talked about the limitations with multiple cells identified in 
Pike [2] and priorities.

Top priorities for cells in Queens
----------------------------------

* Alternate hosts: with multiple cells in a tiered (super) conductor 
mode, we don't have reschedules happening when a server build fails on a 
compute. Ed Leafe has already started working on the code to build an 
object to pass from the scheduler to the super conductor. We'll then 
send that from the super conductor down to the compute service in the 
cell and then reschedules can happen within a cell using that provided 
list of alternative hosts (and pre-determined allocation requests for 
Placement provided by the scheduler). We agreed that we should get this 
done early in Queens so that we have ample time to flush out and fix bugs.

* Instance listing across multiple cells: this is going to involve 
sorting the instance lists we get back from multiple cells, which today 
are filtered/sorted in each cell and then returned out of the API in a 
"barber pole" pattern. We are not going to use Searchlight for this, but 
instead do it with more efficient cross-cell DB queries. Dan Smith is 
going to work on this.

Dealing with up-calls
---------------------

In a multi-cell or tiered (super) conductor mode, the cell conductor and 
compute services cannot reach the top-level database or message queue. 
This breaks a few existing things today.

* Instance affinity reporting from the computes to the scheduler won't 
work without the MQ up-call. There is also a check that happens late in 
the build process on the compute which checks to see if server group 
affinity/anti-affinity policies are maintained which is an up-call to 
the API database. Both of these will be solved long-term when we model 
distance in Placement, but we are deferring that from Queens. The late 
affinity check in the compute is not an issue if you're running a single 
cell (not using a tiered super conductor mode deployment) and if you're 
running multiple cells, you can configure the cell conductors to have 
access to the API database as a workaround. We wouldn't test with this 
workaround in CI, but it's an option for people that need it.

* There is a host aggregate up-call when performing live migration with 
the xen driver and you're letting the driver determine if block 
migration should be used. We decided to just put a note in the code that 
this doesn't work and leave it as a limitation for that driver and 
scenario, which xen driver maintainers or users can fix if they want, 
but we aren't going to make it a priority.

* There is a host aggregate up-call when doing boot from volume and the 
compute service creates the volume, it checks to see if the instance AZ 
and volume AZ match when [cinder]/cross_az_attach is False (not the 
default). Checking the AZ for the instance involves getting the host 
aggregates that the instance is in, and those are in the API database. 
We agreed that for now, people running multiple cells and using this 
cross_az_attach=False setting can configure the cell conductor to reach 
the API database, like the late affinity check described above. Sylvain 
Bauza is also looking at reasons why we even do this check if the user 
did not request a specific AZ, so there could be other general changes 
in the design for this cross_az_check later. That is being discussed 
here [3].

Other discussion
----------------

* We have a utility to concurrently run database queries against 
multiple cells. We are going to look to see if we can retrofit some 
linear paths of the code with this utility to improve performance.

* Making the consoleauth service run per-cell is going to be low 
priority until some large cells v2 deployments start showing up and 
saying that a global consoleauth service is not scaling and it needs to 
be fixed.

* We talked about using the "GET /usages" Placement API for counting 
quotas rather than iterating that information from the cells, but there 
are quite a few open questions about design and edge cases like move 
operations and Ironic with custom resource classes. So while this is 
something that should make counting quotas perform better, it's 
complicated and not a priority for Queens.

* Finally, we also talked about the future of cells v1 and when we can 
officially deprecate and remove it. We've already been putting warnings 
in the code, docs and config options for a long time about not using 
cells v1 and it being replaced with cells v2. *We agreed that if we can 
get efficient multi-cell instance listing fixed in Queens, we'll remove 
both cells v1 and nova-network in Rocky.* We've been asking that large 
cells v1 deployments start checking out cells v2 and what issues they 
run into with the transition, at least since the Boston Pike summit, and 
so far we haven't gotten any feedback, so we're hoping this timeline 
will spur some movement on that front. Dan Smith also called dibs on the 
code removal.

[1] https://etherpad.openstack.org/p/nova-ptg-queens-cells
[2] 
https://docs.openstack.org/nova/latest/user/cellsv2_layout.html#caveats-of-a-multi-cell-deployment
[3] 
http://lists.openstack.org/pipermail/openstack-operators/2017-September/014200.html

-- 

Thanks,

Matt



More information about the OpenStack-operators mailing list