[Openstack-operators] [nova] Need feedback on spec for handling down cells in the API

Matt Riedemann mriedemos at gmail.com
Mon Jun 25 21:17:59 UTC 2018

On 6/7/2018 9:02 AM, Matt Riedemann wrote:
> We have a nova spec [1] which is at the point that it needs some API 
> user (and operator) feedback on what nova API should be doing when 
> listing servers and there are down cells (unable to reach the cell DB or 
> it times out).
> tl;dr: the spec proposes to return "shell" instances which have the 
> server uuid and created_at fields set, and maybe some other fields we 
> can set, but otherwise a bunch of fields in the server response would be 
> set to UNKNOWN sentinel values. This would be unversioned, and therefore 
> could wreak havoc on existing client side code that expects fields like 
> 'config_drive' and 'updated' to be of a certain format.
> There are alternatives listed in the spec so please read this over and 
> provide feedback since this is a pretty major UX change.
> Oh, and no pressure, but today is the spec freeze deadline for Rocky.
> [1] https://review.openstack.org/#/c/557369/

The options laid out right now are:

1. Without a new microversion, include 'shell' servers in the response 
when listing over down cells. These would have UNKNOWN values for the 
fields in the server object. gibi and I didn't like this because 
existing client code wouldn't know how to deal with these UNKNOWN shell 
instances - and not all of the server fields are simple strings, we have 
booleans, integers, dicts and lists, so what would those values be?

2. In a new microversion, return a new top-level parameter when listing 
servers which would include minimal details about servers that are in 
down cells (minimal like just the uuid). This was an alternative gibi 
and I had discussed because we didn't like the client-side impacts w/o a 
microversion or the full 'shell' servers in option 1. From an IRC 
conversation last week with mordred [1], dansmith and mordred don't care 
for the new top-level parameter since clients would have to merge that 
in to the full list of available servers. Plus, in the future, if we 
ever have some kind of caching mechanism in the API from which we can 
pull instance information if it's in a down cell, then the new top-level 
parameter becomes kind of pointless.

3. In a new microversion, include servers from down cells in the same 
top-level servers response parameter but for those in down cells, we'll 
just include minimal information (status=UNKNOWN and the uuid). Clients 
would opt-in to the new microversion when they know how to deal with 
what an instance in UNKNOWN status means. In the future, we could use a 
caching mechanism to fill in these details for instances in down cells.

#3 is kind of a compromise on options 1 and 2, and I'm OK with it 
(barring any hairy details).

In all cases, we won't include 'shell' servers in the response if the 
user is filtering (or paging?) because we can't be honest about the 
results and just have to treat the filters as if they don't apply to the 
instances in the down cell.

If you have a server in a down cell, you can't delete it or really do 
anything with it because we literally can't pull the instance out of the 
cell database while the cell is down. You'd get a 500 or 503 in that case.

Regardless of microversion, we plan on omitting instances from down 
cells when listing which is a backportable reliability bug fix [2] so we 
don't 500 the API when listing across 70 cells and 1 is down.

[2] https://review.openstack.org/#/c/575734/




More information about the OpenStack-operators mailing list