Question: do people think we should make the server status field reflect UNKNOWN as well, if the 'host_status' is UNKNOWN? And if so, should it be controlled by policy or no?
Do we have other things that change *value* depending on policy? I was thinking that was one of the situations the policy people (i.e. Matt) have avoided in the past. Also, AFAIK, our documentation specifies (and existing behavior is) to only return UNKNOWN in the case where we return a partial instance because we couldn't look up the rest of the details from the cell. This would break that relationship, and I'm not sure how people would know that they shouldn't expect a full instance record, other than to poke it with a stick to see if it contains certain properties.
+1 to doing this with a policy. I would prefer giving the ability/choice to the operators to opt-out of it if they want to.
In general, I think we should try to avoid leaking things about the infrastructure to regular users. In the case of a cell being down, we couldn't really fake it because we don't have much of the information available to us. I agree that a host being down is not that different from a cell being down from the perspective of a user, but I also think that allowing operators to opt-in to such a disclosure would be better, although as above, I start to worry about the degrees of freedom in the response. My biggest concern, which came out during the host status discussion, is that we should *not* say the instance is "down" just because the compute service is unreachable. Saying it's in "unknown" state is better. I'd like to hear from some more operators about whether they would opt-in to this unknown-state behavior for compute host down-age. Specifically, whether they want customer instances to show as "unknown" state while they're doing an upgrade that otherwise wouldn't impact the instance's health. --Dan