[nova][dev][ops] server status when compute host is down

Matt Riedemann mriedemos at gmail.com
Thu May 23 16:50:25 UTC 2019


On 5/23/2019 9:05 AM, Dan Smith wrote:
>>   Question: do people think we should make the server status field
>>   reflect UNKNOWN as well, if the 'host_status' is UNKNOWN? And if so,
>>   should it be controlled by policy or no?
> 
> Do we have other things that change *value* depending on policy? I was
> thinking that was one of the situations the policy people (i.e. Matt)
> have avoided in the past.
> 
> Also, AFAIK, our documentation specifies (and existing behavior is) to
> only return UNKNOWN in the case where we return a partial instance
> because we couldn't look up the rest of the details from the cell. This
> would break that relationship, and I'm not sure how people would know
> that they shouldn't expect a full instance record, other than to poke it
> with a stick to see if it contains certain properties.
> 
>> +1 to doing this with a policy. I would prefer giving the
>> ability/choice to the operators to opt-out of it if they want to.
> 
> In general, I think we should try to avoid leaking things about the
> infrastructure to regular users. In the case of a cell being down, we
> couldn't really fake it because we don't have much of the information
> available to us. I agree that a host being down is not that different
> from a cell being down from the perspective of a user, but I also think
> that allowing operators to opt-in to such a disclosure would be better,
> although as above, I start to worry about the degrees of freedom in the
> response.
> 
> My biggest concern, which came out during the host status discussion, is
> that we should *not* say the instance is "down" just because the compute
> service is unreachable. Saying it's in "unknown" state is better.
> 
> I'd like to hear from some more operators about whether they would
> opt-in to this unknown-state behavior for compute host
> down-age. Specifically, whether they want customer instances to show as
> "unknown" state while they're doing an upgrade that otherwise wouldn't
> impact the instance's health.
> 
> --Dan
> 

Agree with Dan that I'd like some operator input on this thread before 
we consider making a change in behavior.

Changing the UNKNOWN status based on down cell vs compute service is 
down is also confusing as Dan mentions above because vm_state being 
UNKNOWN is only new as of Stein and is only for the down cell case.

With the 'nova list --fields' thing aside, we already have a workaround 
for this today, right? If I'm an operator and want to expose this 
information to my users, I configure nova's policy to have:

"os_compute_api:servers:show:host_status": "rule:admin_or_owner"

And then the user, with the proper microversion, can see the host status 
if the cloud allows it.

As an aside, I now realize we have a nasty performance regression since 
Stein [1] when listing servers with details concerning this host_status 
field. The code used to rely on this method [2] to cache the host status 
information per host when iterating over a list of instances but now it 
fetches it per host per instance in the view builder [3]. Granted by 
default policy this would only affect performance for an admin, but if 
I'm an admin listing 1000 servers across all tenants using "nova list 
--all-tenants" (which is going to use a microversion high enough to hit 
this) it could be a noticeable slow down compared to before Stein. I'll 
open a bug.

[1] https://review.opendev.org/#/c/584590/
[2] 
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/compute/api.py#L4926
[3] 
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/api/openstack/compute/views/servers.py#L325

-- 

Thanks,

Matt



More information about the openstack-discuss mailing list