[openstack-dev] [nova] should we have a stale data indication in "nova list/show"?

Ahmed RAHAL arahal at iweb.com
Tue Jun 24 23:16:31 UTC 2014


Le 2014-06-24 17:38, Joe Gordon a écrit :
>
> On Jun 24, 2014 2:31 PM, "Russell Bryant" <rbryant at redhat.com
> <mailto:rbryant at redhat.com>> wrote:

>  > There be dragons here.  Just because Nova doesn't see the node reporting
>  > in, doesn't mean the VMs aren't actually still running.  I think this
>  > needs to be left to logic outside of Nova.
>  >
>  > For example, if your deployment monitoring really does think the host is
>  > down, you want to make sure it's *completely* dead before taking further
>  > action such as evacuating the host.  You certainly don't want to risk
>  > having the VM running on two different hosts.  This is just a business I
>  > don't think Nova should be getting in to.
>
> I agree nova shouldn't take any actions. But I don't think leaving an
> instance as 'active' is right either.  I was thinking move instance to
> error state (maybe an unknown state would be more accurate) and let the
> user deal with it, versus just letting the user deal with everything.
> Since nova knows something *may* be wrong shouldn't we convey that to
> the user (I'm not 100% sure we should myself).

I saw compute nodes going down, from a management perspective (say, 
nova-compute disappeared), but VMs were just fine. Reporting on the 
state may be misleading. The 'unknown' state would fit, but nothing lets 
us presume the VMs are non-functional or impacted.

As far as an operator is concerned, a compute node not responding is a 
reason enough to check the situation.

To go further about other comments related to customer feedback, there 
are many reasons a customer may think his VM is down, so showing him a 
'useful information' in some cases will only trigger more anxiety.
Besides people will start hammering the API to check 'state' instead of 
using proper monitoring.
But, state is already reported if the customer shuts down a VM, so ...

Currently, compute nodes state reporting is done by the nova-compute 
process himself, reporting back with a time stamp to the database 
(through conductor if I recall well). It's more like a watchdog than a 
reporting system.
For VMs (assuming we find it useful) the same kind of process could 
occur: nova-compute reporting back all states with time stamps for all 
VMs he hosts. This shall then be optional, as I already sense 
scaling/performance issues here (ceilometer anyone ?).

Finally, assuming the customer had access to this 'unknown' state 
information, what would he be able to do with it ? Usually he has no 
lever to 'evacuate' or 'recover' the VM. All he could do is spawn 
another instance to replace the lost one. But only if the VM really is 
currently unavailable, an information he must get from other sources.

So, I see how the state reporting could be a useful information, but am 
not sure that nova Status is the right place for it.

Ahmed.



More information about the OpenStack-dev mailing list