[nova][dev][ops] server status when compute host is down
Matt Riedemann
mriedemos at gmail.com
Thu May 23 16:50:25 UTC 2019
On 5/23/2019 9:05 AM, Dan Smith wrote:
>> Question: do people think we should make the server status field
>> reflect UNKNOWN as well, if the 'host_status' is UNKNOWN? And if so,
>> should it be controlled by policy or no?
>
> Do we have other things that change *value* depending on policy? I was
> thinking that was one of the situations the policy people (i.e. Matt)
> have avoided in the past.
>
> Also, AFAIK, our documentation specifies (and existing behavior is) to
> only return UNKNOWN in the case where we return a partial instance
> because we couldn't look up the rest of the details from the cell. This
> would break that relationship, and I'm not sure how people would know
> that they shouldn't expect a full instance record, other than to poke it
> with a stick to see if it contains certain properties.
>
>> +1 to doing this with a policy. I would prefer giving the
>> ability/choice to the operators to opt-out of it if they want to.
>
> In general, I think we should try to avoid leaking things about the
> infrastructure to regular users. In the case of a cell being down, we
> couldn't really fake it because we don't have much of the information
> available to us. I agree that a host being down is not that different
> from a cell being down from the perspective of a user, but I also think
> that allowing operators to opt-in to such a disclosure would be better,
> although as above, I start to worry about the degrees of freedom in the
> response.
>
> My biggest concern, which came out during the host status discussion, is
> that we should *not* say the instance is "down" just because the compute
> service is unreachable. Saying it's in "unknown" state is better.
>
> I'd like to hear from some more operators about whether they would
> opt-in to this unknown-state behavior for compute host
> down-age. Specifically, whether they want customer instances to show as
> "unknown" state while they're doing an upgrade that otherwise wouldn't
> impact the instance's health.
>
> --Dan
>
Agree with Dan that I'd like some operator input on this thread before
we consider making a change in behavior.
Changing the UNKNOWN status based on down cell vs compute service is
down is also confusing as Dan mentions above because vm_state being
UNKNOWN is only new as of Stein and is only for the down cell case.
With the 'nova list --fields' thing aside, we already have a workaround
for this today, right? If I'm an operator and want to expose this
information to my users, I configure nova's policy to have:
"os_compute_api:servers:show:host_status": "rule:admin_or_owner"
And then the user, with the proper microversion, can see the host status
if the cloud allows it.
As an aside, I now realize we have a nasty performance regression since
Stein [1] when listing servers with details concerning this host_status
field. The code used to rely on this method [2] to cache the host status
information per host when iterating over a list of instances but now it
fetches it per host per instance in the view builder [3]. Granted by
default policy this would only affect performance for an admin, but if
I'm an admin listing 1000 servers across all tenants using "nova list
--all-tenants" (which is going to use a microversion high enough to hit
this) it could be a noticeable slow down compared to before Stein. I'll
open a bug.
[1] https://review.opendev.org/#/c/584590/
[2]
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/compute/api.py#L4926
[3]
https://github.com/openstack/nova/blob/c7e9e667426a6d88d396a59cb40d30763a3265f9/nova/api/openstack/compute/views/servers.py#L325
--
Thanks,
Matt
More information about the openstack-discuss
mailing list