[openstack-dev] [nova] should we have a stale data indication in "nova list/show"?
Ahmed RAHAL
arahal at iweb.com
Tue Jun 24 23:16:31 UTC 2014
Le 2014-06-24 17:38, Joe Gordon a écrit :
>
> On Jun 24, 2014 2:31 PM, "Russell Bryant" <rbryant at redhat.com
> <mailto:rbryant at redhat.com>> wrote:
> > There be dragons here. Just because Nova doesn't see the node reporting
> > in, doesn't mean the VMs aren't actually still running. I think this
> > needs to be left to logic outside of Nova.
> >
> > For example, if your deployment monitoring really does think the host is
> > down, you want to make sure it's *completely* dead before taking further
> > action such as evacuating the host. You certainly don't want to risk
> > having the VM running on two different hosts. This is just a business I
> > don't think Nova should be getting in to.
>
> I agree nova shouldn't take any actions. But I don't think leaving an
> instance as 'active' is right either. I was thinking move instance to
> error state (maybe an unknown state would be more accurate) and let the
> user deal with it, versus just letting the user deal with everything.
> Since nova knows something *may* be wrong shouldn't we convey that to
> the user (I'm not 100% sure we should myself).
I saw compute nodes going down, from a management perspective (say,
nova-compute disappeared), but VMs were just fine. Reporting on the
state may be misleading. The 'unknown' state would fit, but nothing lets
us presume the VMs are non-functional or impacted.
As far as an operator is concerned, a compute node not responding is a
reason enough to check the situation.
To go further about other comments related to customer feedback, there
are many reasons a customer may think his VM is down, so showing him a
'useful information' in some cases will only trigger more anxiety.
Besides people will start hammering the API to check 'state' instead of
using proper monitoring.
But, state is already reported if the customer shuts down a VM, so ...
Currently, compute nodes state reporting is done by the nova-compute
process himself, reporting back with a time stamp to the database
(through conductor if I recall well). It's more like a watchdog than a
reporting system.
For VMs (assuming we find it useful) the same kind of process could
occur: nova-compute reporting back all states with time stamps for all
VMs he hosts. This shall then be optional, as I already sense
scaling/performance issues here (ceilometer anyone ?).
Finally, assuming the customer had access to this 'unknown' state
information, what would he be able to do with it ? Usually he has no
lever to 'evacuate' or 'recover' the VM. All he could do is spawn
another instance to replace the lost one. But only if the VM really is
currently unavailable, an information he must get from other sources.
So, I see how the state reporting could be a useful information, but am
not sure that nova Status is the right place for it.
Ahmed.
More information about the OpenStack-dev
mailing list