Nova shows incorrect VM status when compute is down.

Ramesh Ramanathan B ramerama at tataelxsi.co.in
Mon Jun 21 01:05:36 UTC 2021


Hi Sean,

Thank you for the response. I understand the rationale you have discussed, but for us this is a problem since we are building a monitoring system and with this behavior it is impossible for us to know if a service is down or not (during a compute failure).

Any suggestions here on how this situation can be handled?

Thanks

Regards,
Ramesh

________________________________
From: Sean Mooney <smooney at redhat.com>
Sent: Thursday, June 17, 2021 10:48 PM
To: Ramesh Ramanathan B <ramerama at tataelxsi.co.in>; openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>; Melanie Witt <melwitt at redhat.com>
Subject: Re: Nova shows incorrect VM status when compute is down.

________________________________
 **This is an external email. Please check the sender’s full email address (not just the sender name) and exercise caution before you respond or click any embedded link/attachment.**
________________________________

On Thu, 2021-06-17 at 14:24 +0000, Ramesh Ramanathan B wrote:
> Dear All,
>
> One observation we have while using Open Stack Rocky is, when a
> compute node goes down, the VM status still shows active (the servers
> running in the compute node that went down). Is this the expected
> behavior? Any configurations required to get the right status.
yes this is expected behavior
when the compute agent heartbeat is missed and we do not know the
status of the vms we continue to report them in the last state we knew
of. wedicussed adding an unknow state at onepoint to the api. im not
sure if that has been added yet melanie i think you reviewd or worked
on that?

there was concern about exposing this as it is exposing info about the
backend hosts for exampel if a cell db connection goes down but the vm
is still active it woudl be incorrect to report the vm state as down
because it actully unknown and in this case the vm is still active.

in the case were the comptue agent was stopped for mainatnce we also do
not want to set the vms state as down as again stoping the agent will
not prevent the vms form working.

in either case of the cell connection being tempory disrupted or the
compute agent being stopped reporting the vm as downs in the api could
lead to data currption if you evacuated the vm or a user deleted it and
tried to resue its data voluems for a new vms

so ingeneral it incorrect to assuem that the vm status in the db refect
the state of the vm on the host if the compute agent is down and its
not correct to udpate the status in the db to down.
making it as unkonw coudl be valide but some operator objected to that
as it was leaking information about there data ceneter(such as they are
currently doing an upgrade/matainece and hvae stopped the agent) to
custoemr that they seee as a security issue.


>
> In the attached image the compute is down, but the VM status still
> shows active. We are running a data center so it is not practical to
> run nova reset-state for all the servers.
reset-state is not intended to be used for this.
infact reset-state should almost never be used.
you should treat every invocation of reset state as running an arbiraty
sql update query and avoid it unless absolute nessisary.

>  Is there an API to force update Nova to show the correct status? Or
> any configurations missing that is causing this?
>
> Thanks
>
> Regards,
> Ramesh
>
> ________________________________
> Disclaimer: This email and any files transmitted with it are
> confidential and intended solely for the use of the individual or
> entity to whom they are addressed. If you are not the intended
> recipient of this message , or if this message has been addressed to
> you in error, please immediately alert the sender by reply email and
> then delete this message and any attachments. If you are not the
> intended recipient, you are hereby notified that any use,
> dissemination, copying, or storage of this message or its attachments
> is strictly prohibited. Email transmission cannot be guaranteed to be
> secure or error-free, as information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or contain viruses. The
> sender, therefore, does not accept liability for any errors,
> omissions or contaminations in the contents of this message which
> might have occurred as a result of email transmission. If
> verification is required, please request for a hard-copy version.
> ________________________________


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210621/87dbfe72/attachment.html>


More information about the openstack-discuss mailing list