Thanks for explanation Ignazio.I have tried same same by trying to put the compute node on a failure (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not able connect to it.All the VMs are now in Error state.Running the host-evacaute was successful on controller node, but now I am not able to use the VMs. Because they are all in error state now.root@h004:~$ nova host-evacuate h017
+--------------------------------------+-------------------+---------------+
| Server UUID | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+
| f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True | |
| 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True | |
| abe7075b-ac22-4168-bf3d-d302ba37d80e | True | |
| c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True | |
| ffd983bb-851e-4314-9d1d-375303c278f3 | True | |
+--------------------------------------+-------------------+---------------+Now I have restarted the compute node manually , now I am able to connect to the compute node but VMs are still in Error state.1. Any ideas, how to recover the VMs?2. Are there any other methods to evacuate, as this method seems to be not working in mitaka version.~Jay.On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano <ignaziocassano@gmail.com> wrote:Ok Jay,let me to describe my environment.I have an openstack made up of 3 controllers nodes ad several compute nodes.The controller nodes services are controlled by pacemaker and the compute nodes services are controlled by remote pacemaker.My hardware is Dell so I am using ipmi fencing device .I wrote a service controlled by pacemaker:this service controls if a compude node fails and for avoiding split brains if a compute node does nod respond on the management network and on storage network the stonith poweroff the node and then execute a nova host-evacuate.Anycase to have a simulation before writing the service I described above you can do as follows:connect on one compute node where some virtual machines are runningrun the command: echo 'c' > /proc/sysrq-trigger (it stops immediately the node like in case of failure)On a controller node run: nova host-evacuate "name of failed compute node"Instances running on the failed compute node should be restarted on another compute nodeIgnazioIl giorno gio 11 lug 2019 alle ore 11:57 Jay See <jayachander.it@gmail.com> ha scritto:Hi ,I have tried on a failed compute node which is in power off state now.I have tried on a running compute node, no errors. But nothing happens.On running compute node - Disabled the compute service and tried migration also.May be I might have not followed proper steps. Just wanted to know the steps you have followed. Otherwise, I was planning to manual migration also if possible.~Jay.On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:Hi Jay,would you like to evacuate a failed compute node or evacuate a running compute node ?IgnazioIl giorno gio 11 lug 2019 alle ore 11:48 Jay See <jayachander.it@gmail.com> ha scritto:Hi Ignazio,I am trying to evacuate the compute host on older version (mitaka).Could please share the process you followed. I am not able to succeed with openstack live-migration fails with error message (this is known issue in older versions) and nova live-ligration - nothing happens even after initiating VM migration. It is almost 4 days.~Jay.On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:I am sorry.For simulating an host crash I used a wrong procedure.Using "echo 'c' > /proc/sysrq-trigger" all work fineIl giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano <ignaziocassano@gmail.com> ha scritto:Hello All,on ocata when I poweroff a node with active instance , doing a nova host-evacuate works fineand instances are restartd on an active node.On queens it does non evacuate instances but nova-api reports for each instance the following:2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is in task_state powering-offSo it poweroff all instance on the failed node but does not start them on active nodesWhat is changed ?Ignazio--P SAVE PAPER – Please do not print this e-mail unless absolutely necessary.--P SAVE PAPER – Please do not print this e-mail unless absolutely necessary.--P SAVE PAPER – Please do not print this e-mail unless absolutely necessary.