[openstack-dev] [nova] should we have a stale data indication in "nova list/show"?

Chris Behrens cbehrens at codestud.com
Wed Jun 25 00:50:05 UTC 2014


I don't think we should be flipping states for instances on a potentially downed compute. We definitely should not set an instance to ERROR. I think a time associated with the last power state check might be nice and be good enough.

- Chris

> On Jun 24, 2014, at 5:17 PM, Joe Gordon <joe.gordon0 at gmail.com> wrote:
> 
> 
> 
> 
>> On Tue, Jun 24, 2014 at 5:12 PM, Joe Gordon <joe.gordon0 at gmail.com> wrote:
>> 
>> 
>> 
>>> On Tue, Jun 24, 2014 at 4:16 PM, Ahmed RAHAL <arahal at iweb.com> wrote:
>>> Le 2014-06-24 17:38, Joe Gordon a écrit :
>>>> 
>>>> On Jun 24, 2014 2:31 PM, "Russell Bryant" <rbryant at redhat.com
>>>> <mailto:rbryant at redhat.com>> wrote:
>>> 
>>>>  > There be dragons here.  Just because Nova doesn't see the node reporting
>>>>  > in, doesn't mean the VMs aren't actually still running.  I think this
>>>>  > needs to be left to logic outside of Nova.
>>>>  >
>>>>  > For example, if your deployment monitoring really does think the host is
>>>>  > down, you want to make sure it's *completely* dead before taking further
>>>>  > action such as evacuating the host.  You certainly don't want to risk
>>>>  > having the VM running on two different hosts.  This is just a business I
>>>>  > don't think Nova should be getting in to.
>>>> 
>>>> I agree nova shouldn't take any actions. But I don't think leaving an
>>>> instance as 'active' is right either.  I was thinking move instance to
>>>> error state (maybe an unknown state would be more accurate) and let the
>>>> user deal with it, versus just letting the user deal with everything.
>>>> Since nova knows something *may* be wrong shouldn't we convey that to
>>>> the user (I'm not 100% sure we should myself).
>>> 
>>> I saw compute nodes going down, from a management perspective (say, nova-compute disappeared), but VMs were just fine. Reporting on the state may be misleading. The 'unknown' state would fit, but nothing lets us presume the VMs are non-functional or impacted.
>> 
>> nothing lets us presume the opposite as well. We don't know if the instance is still up.
>>  
>>> 
>>> As far as an operator is concerned, a compute node not responding is a reason enough to check the situation.
>>> 
>>> To go further about other comments related to customer feedback, there are many reasons a customer may think his VM is down, so showing him a 'useful information' in some cases will only trigger more anxiety.
>>> Besides people will start hammering the API to check 'state' instead of using proper monitoring.
>>> But, state is already reported if the customer shuts down a VM, so ...
>>> 
>>> Currently, compute nodes state reporting is done by the nova-compute process himself, reporting back with a time stamp to the database (through conductor if I recall well). It's more like a watchdog than a reporting system.
>>> For VMs (assuming we find it useful) the same kind of process could occur: nova-compute reporting back all states with time stamps for all VMs he hosts. This shall then be optional, as I already sense scaling/performance issues here (ceilometer anyone ?).
>>> 
>>> Finally, assuming the customer had access to this 'unknown' state information, what would he be able to do with it ? Usually he has no lever to 'evacuate' or 'recover' the VM. All he could do is spawn another instance to replace the lost one. But only if the VM really is currently unavailable, an information he must get from other sources.
>> 
>> If I was a user, and my instance went to an 'UNKNOWN' state, I would check if its still operating, and if not delete it and start another instance.
> 
> The alternative is how things work today, if a nova-compute goes down we don't change any instance states, and the user is responsible for making sure there instance is still operating even if the instance is set to ACTIVE.
>  
>>  
>>> 
>>> So, I see how the state reporting could be a useful information, but am not sure that nova Status is the right place for it.
>>> 
>>> Ahmed. in
>>> 
>>> 
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140624/306f0c18/attachment.html>


More information about the OpenStack-dev mailing list