On 5/23/19 1:08 PM, melanie witt wrote:
On Thu, 23 May 2019 11:56:34 -0700, Iain Macdonnell <iain.macdonnell@oracle.com> wrote:
On 5/23/19 11:32 AM, Matt Riedemann wrote:
As I said elsewhere in this thread, if you're proposing to add a new policy rule to change the 'status' field based on host_status, why not just tell people to open up the policy rule we already have for the host_status field so non-admins can see it in their server details? This sounds like an education problem more than a technical problem to me.
Because *that* implies revealing infrastructure status details to end-users, which is probably not desirable in a lot of cases.
This is a good point. If an operator were to enable 'host_status' via policy, end users would also get to see host_status UP and DOWN, which is typically not desired by cloud admins. There's currently no option for exposing only UNKNOWN, as a small but helpful bit of info for end users.
Isn't this as simple as not lying to the user about the *server* status when it cannot be ascertained for any reason? In that case, the user should be given (only) that information, but not any "dirty laundry" about what caused it....
Even if the admin doesn't care about revealing infrastructure status, the end-user shouldn't have to know that server_status can't be trusted, and that they have to check other fields to figure out if it's reliable or not at any given time.
And yes, I was thinking about it more simply, and the replies on this thread have led me to think that if we could show the cosmetic-only status of UNKNOWN for nova-compute communication interruptions, similar to what we do for down cells, we would not put a policy control on it (since UNKNOWN is not leaking infra details). And not make any changes to notifications etc, just a cosmetic-only UNKNOWN status implemented at the REST API layer if host_status is UNKNOWN. I was thinking maybe we'd leave server status alone if host_status is UP or DOWN since its status should be reflected in those cases as-is.
Assuming we could move forward without a policy control on it, I think the only remaining concern would be the collision of UNKNOWN status with down cells where for down cells, some server attributes are not available. Personally, this doesn't seem like a major problem to me since UNKNOWN implies an uncertain state, in general. But maybe I'm wrong. How important is the difference?
Finally, it sounds like the consensus is that if we do decide to make this change, we would need a new microversion to account for server status being able to be UNKNOWN if host_status is UNKNOWN.
FYI, I've proposed a spec here: https://review.opendev.org/666181 -melanie