[nova][dev][ops] server status when compute host is down
melwittt at gmail.com
Thu May 23 20:08:38 UTC 2019
On Thu, 23 May 2019 11:56:34 -0700, Iain Macdonnell
<iain.macdonnell at oracle.com> wrote:
> On 5/23/19 11:32 AM, Matt Riedemann wrote:
>> As I said elsewhere in this thread, if you're proposing to add a new
>> policy rule to change the 'status' field based on host_status, why not
>> just tell people to open up the policy rule we already have for the
>> host_status field so non-admins can see it in their server details? This
>> sounds like an education problem more than a technical problem to me.
> Because *that* implies revealing infrastructure status details to
> end-users, which is probably not desirable in a lot of cases.
This is a good point. If an operator were to enable 'host_status' via
policy, end users would also get to see host_status UP and DOWN, which
is typically not desired by cloud admins. There's currently no option
for exposing only UNKNOWN, as a small but helpful bit of info for end users.
> Isn't this as simple as not lying to the user about the *server* status
> when it cannot be ascertained for any reason? In that case, the user
> should be given (only) that information, but not any "dirty laundry"
> about what caused it....
> Even if the admin doesn't care about revealing infrastructure status,
> the end-user shouldn't have to know that server_status can't be trusted,
> and that they have to check other fields to figure out if it's reliable
> or not at any given time.
And yes, I was thinking about it more simply, and the replies on this
thread have led me to think that if we could show the cosmetic-only
status of UNKNOWN for nova-compute communication interruptions, similar
to what we do for down cells, we would not put a policy control on it
(since UNKNOWN is not leaking infra details). And not make any changes
to notifications etc, just a cosmetic-only UNKNOWN status implemented at
the REST API layer if host_status is UNKNOWN. I was thinking maybe we'd
leave server status alone if host_status is UP or DOWN since its status
should be reflected in those cases as-is.
Assuming we could move forward without a policy control on it, I think
the only remaining concern would be the collision of UNKNOWN status with
down cells where for down cells, some server attributes are not
available. Personally, this doesn't seem like a major problem to me
since UNKNOWN implies an uncertain state, in general. But maybe I'm
wrong. How important is the difference?
Finally, it sounds like the consensus is that if we do decide to make
this change, we would need a new microversion to account for server
status being able to be UNKNOWN if host_status is UNKNOWN.
More information about the openstack-discuss