[nova][dev][ops] server status when compute host is down

iain.macdonnell at oracle.com iain.macdonnell at oracle.com
Thu May 23 17:26:18 UTC 2019



On 5/23/19 3:11 AM, Matthew Booth wrote:
> On Thu, 23 May 2019 at 03:02, melanie witt <melwittt at gmail.com> wrote:
>>
>> Hey all,
>>
>> I'm looking for feedback around whether we can improve how we show
>> server status in server list and server show when the compute host it
>> resides on is down.
>>
>> When a compute host goes down while a server on it was previously
>> running, the server status continues to show as ACTIVE in a server list.
>> This is because the power state and status is adjusted by a periodic
>> task run by nova-compute, so if nova-compute is down, it cannot update
>> those states.
>>
>> So, for an end user, when they do a server list, they see their server
>> as ACTIVE when it's actually powered off.
>>
>> We have another field called 'host_status' available since API
>> microversion 2.16 [1] which is controlled by policy and defaults to
>> admin, which is capable of showing the server status as UNKNOWN if the
>> field is specified, for example:
>>
>> nova list --fields
>> id,name,status,task_state,power_state,networks,host_status
>>
>> This is cool, but it is only available to admin by default, and it
>> requires that the end user adds the field to their CLI command in the
>> --fields option.
>>
>> Question: do people think we should make the server status field reflect
>> UNKNOWN as well, if the 'host_status' is UNKNOWN? And if so, should it
>> be controlled by policy or no?
>>
>> Normally, we do not expose compute host details to non-admin in the API
>> by default, but I noticed recently that our "down cells" support will
>> show server status as UNKNOWN if a server is in a down cell [2]. So I
>> wondered if it would be considered OK to show UNKNOWN if a host is down
>> we well, without defaulting it to admin-only.
> 
> +1 from me. This seems to have confused users in the past and honest
> is better than potentially wrong, imho. I can't think of a reason why
> this information 'leak' would cause any problems. Can anybody else?

Agreed. I don't think that a server status of "UNKNOWN" really 
constitutes "exposing compute host details". It's not sharing anything 
about *why* the server status is unknown - it's just not pretending that 
the last known status is still valid, when that may or may not actually 
be true. Or is the proposal to expose host_status where it would not 
normally be visible?

It seems that the the down-host scenario is basically the same as 
down-cell, as far as being able to ascertain server status, so it seems 
to make sense to use the same indicator.

     ~iain





More information about the openstack-discuss mailing list