[openstack-dev] [nova] Rocky PTG summary - cells
melwittt at gmail.com
Wed Mar 14 18:07:27 UTC 2018
I’ve created a summary etherpad  for the nova cells session from the PTG and included a plain text export of it on this email.
*Cells: Rocky PTG Summary
* How to handle a "down" cell
* How to handle each cell having a separate ceph cluster
* How do we plan to progress on removing "upcalls"
*Agreements and decisions
* In order to list instances even when we can't connect to a cell database, we'll construct something minimal from the API database and we'll add a column to the instance_mappings table such as "queued_for_delete" to determine which are the non-deleted instances and then list them.
* tssurya will write a spec for the new column.
* We're not going to pursue the approach of having backup URLs for cell databases to fall back on when a cell is "down".
* An attempt to delete an instance in a "down" cell should result in a 500 or 503 error.
* An attempt to create an instance should be blocked if the project has instances in a "down" cell (the instance_mappings table has a "project_id" column) because we cannot count instances in "down" cells for the quota check.
* At this time, we won't pursue the idea of adding an allocation "type" concept to placement (which could be leveraged for counting cores/ram resource usage for quotas).
* The topic of each cell having a separate ceph cluster and having each cell cache images in the imagebackend led to the topic of the "cinder imagebackend" again.
* Implementing a cinder imagebackend in nova would be an enormous undertaking that realistically isn't going to happen.
* A pragmatic solution was suggested to make boot-from-volume a first class citizen and make automatic boot-from-volume work well, so that we let cinder handle the caching of images in this scenario (and of course handle all of the other use cases for cinder imagebackend). This would eventually lead to the deprecation of the ceph imagebackend. Further discussion is required on this.
* On removing upcalls, progress in placement will help address the remaining upcalls.
* dansmith will work on filtering compute hosts using the volume availability zone to address the cinder/cross_az_attach issue. mriedem and bauzas will review.
* For the xenapi host aggregate upcall, the xenapi subteam will remove it as a patch on top of their live-migration support patch series.
* For the server group late affinity check up-call for server create and evacuate, the plan is to handle it race-free with placement/scheduler. However, affinity modeling in placement isn't slated for work in Rocky, so the late affinity check upcall will have to be removed in S, at the earliest.
More information about the OpenStack-dev