[queens][nova] nova host-evacuate errot
Hello All, on ocata when I poweroff a node with active instance , doing a nova host-evacuate works fine and instances are restartd on an active node. On queens it does non evacuate instances but nova-api reports for each instance the following: 2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is in task_state powering-off So it poweroff all instance on the failed node but does not start them on active nodes What is changed ? Ignazio
I am sorry. For simulating an host crash I used a wrong procedure. Using "echo 'c' > /proc/sysrq-trigger" all work fine Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < ignaziocassano@gmail.com> ha scritto:
Hello All, on ocata when I poweroff a node with active instance , doing a nova host-evacuate works fine and instances are restartd on an active node. On queens it does non evacuate instances but nova-api reports for each instance the following:
2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is in task_state powering-off
So it poweroff all instance on the failed node but does not start them on active nodes
What is changed ? Ignazio
Hi Ignazio, I am trying to evacuate the compute host on older version (mitaka). Could please share the process you followed. I am not able to succeed with openstack live-migration fails with error message (this is known issue in older versions) and nova live-ligration - nothing happens even after initiating VM migration. It is almost 4 days. ~Jay. On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
I am sorry. For simulating an host crash I used a wrong procedure. Using "echo 'c' > /proc/sysrq-trigger" all work fine
Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < ignaziocassano@gmail.com> ha scritto:
Hello All, on ocata when I poweroff a node with active instance , doing a nova host-evacuate works fine and instances are restartd on an active node. On queens it does non evacuate instances but nova-api reports for each instance the following:
2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is in task_state powering-off
So it poweroff all instance on the failed node but does not start them on active nodes
What is changed ? Ignazio
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Hi Jay, would you like to evacuate a failed compute node or evacuate a running compute node ? Ignazio Il giorno gio 11 lug 2019 alle ore 11:48 Jay See <jayachander.it@gmail.com> ha scritto:
Hi Ignazio,
I am trying to evacuate the compute host on older version (mitaka). Could please share the process you followed. I am not able to succeed with openstack live-migration fails with error message (this is known issue in older versions) and nova live-ligration - nothing happens even after initiating VM migration. It is almost 4 days.
~Jay.
On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
I am sorry. For simulating an host crash I used a wrong procedure. Using "echo 'c' > /proc/sysrq-trigger" all work fine
Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < ignaziocassano@gmail.com> ha scritto:
Hello All, on ocata when I poweroff a node with active instance , doing a nova host-evacuate works fine and instances are restartd on an active node. On queens it does non evacuate instances but nova-api reports for each instance the following:
2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is in task_state powering-off
So it poweroff all instance on the failed node but does not start them on active nodes
What is changed ? Ignazio
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Hi , I have tried on a failed compute node which is in power off state now. I have tried on a running compute node, no errors. But nothing happens. On running compute node - Disabled the compute service and tried migration also. May be I might have not followed proper steps. Just wanted to know the steps you have followed. Otherwise, I was planning to manual migration also if possible. ~Jay. On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Hi Jay, would you like to evacuate a failed compute node or evacuate a running compute node ?
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:48 Jay See <jayachander.it@gmail.com> ha scritto:
Hi Ignazio,
I am trying to evacuate the compute host on older version (mitaka). Could please share the process you followed. I am not able to succeed with openstack live-migration fails with error message (this is known issue in older versions) and nova live-ligration - nothing happens even after initiating VM migration. It is almost 4 days.
~Jay.
On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
I am sorry. For simulating an host crash I used a wrong procedure. Using "echo 'c' > /proc/sysrq-trigger" all work fine
Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < ignaziocassano@gmail.com> ha scritto:
Hello All, on ocata when I poweroff a node with active instance , doing a nova host-evacuate works fine and instances are restartd on an active node. On queens it does non evacuate instances but nova-api reports for each instance the following:
2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is in task_state powering-off
So it poweroff all instance on the failed node but does not start them on active nodes
What is changed ? Ignazio
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Ok Jay, let me to describe my environment. I have an openstack made up of 3 controllers nodes ad several compute nodes. The controller nodes services are controlled by pacemaker and the compute nodes services are controlled by remote pacemaker. My hardware is Dell so I am using ipmi fencing device . I wrote a service controlled by pacemaker: this service controls if a compude node fails and for avoiding split brains if a compute node does nod respond on the management network and on storage network the stonith poweroff the node and then execute a nova host-evacuate. Anycase to have a simulation before writing the service I described above you can do as follows: connect on one compute node where some virtual machines are running run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately the node like in case of failure) On a controller node run: nova host-evacuate "name of failed compute node" Instances running on the failed compute node should be restarted on another compute node Ignazio Il giorno gio 11 lug 2019 alle ore 11:57 Jay See <jayachander.it@gmail.com> ha scritto:
Hi ,
I have tried on a failed compute node which is in power off state now. I have tried on a running compute node, no errors. But nothing happens. On running compute node - Disabled the compute service and tried migration also.
May be I might have not followed proper steps. Just wanted to know the steps you have followed. Otherwise, I was planning to manual migration also if possible. ~Jay.
On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Hi Jay, would you like to evacuate a failed compute node or evacuate a running compute node ?
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:48 Jay See < jayachander.it@gmail.com> ha scritto:
Hi Ignazio,
I am trying to evacuate the compute host on older version (mitaka). Could please share the process you followed. I am not able to succeed with openstack live-migration fails with error message (this is known issue in older versions) and nova live-ligration - nothing happens even after initiating VM migration. It is almost 4 days.
~Jay.
On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
I am sorry. For simulating an host crash I used a wrong procedure. Using "echo 'c' > /proc/sysrq-trigger" all work fine
Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < ignaziocassano@gmail.com> ha scritto:
Hello All, on ocata when I poweroff a node with active instance , doing a nova host-evacuate works fine and instances are restartd on an active node. On queens it does non evacuate instances but nova-api reports for each instance the following:
2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is in task_state powering-off
So it poweroff all instance on the failed node but does not start them on active nodes
What is changed ? Ignazio
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Thanks for explanation Ignazio. I have tried same same by trying to put the compute node on a failure (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not able connect to it. All the VMs are now in Error state. Running the host-evacaute was successful on controller node, but now I am not able to use the VMs. Because they are all in error state now. root@h004:~$ nova host-evacuate h017 +--------------------------------------+-------------------+---------------+ | Server UUID | Evacuate Accepted | Error Message | +--------------------------------------+-------------------+---------------+ | f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True | | | 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True | | | abe7075b-ac22-4168-bf3d-d302ba37d80e | True | | | c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True | | | ffd983bb-851e-4314-9d1d-375303c278f3 | True | | +--------------------------------------+-------------------+---------------+ Now I have restarted the compute node manually , now I am able to connect to the compute node but VMs are still in Error state. 1. Any ideas, how to recover the VMs? 2. Are there any other methods to evacuate, as this method seems to be not working in mitaka version. ~Jay. On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Ok Jay, let me to describe my environment. I have an openstack made up of 3 controllers nodes ad several compute nodes. The controller nodes services are controlled by pacemaker and the compute nodes services are controlled by remote pacemaker. My hardware is Dell so I am using ipmi fencing device . I wrote a service controlled by pacemaker: this service controls if a compude node fails and for avoiding split brains if a compute node does nod respond on the management network and on storage network the stonith poweroff the node and then execute a nova host-evacuate.
Anycase to have a simulation before writing the service I described above you can do as follows:
connect on one compute node where some virtual machines are running run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately the node like in case of failure) On a controller node run: nova host-evacuate "name of failed compute node" Instances running on the failed compute node should be restarted on another compute node
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:57 Jay See <jayachander.it@gmail.com> ha scritto:
Hi ,
I have tried on a failed compute node which is in power off state now. I have tried on a running compute node, no errors. But nothing happens. On running compute node - Disabled the compute service and tried migration also.
May be I might have not followed proper steps. Just wanted to know the steps you have followed. Otherwise, I was planning to manual migration also if possible. ~Jay.
On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
Hi Jay, would you like to evacuate a failed compute node or evacuate a running compute node ?
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:48 Jay See < jayachander.it@gmail.com> ha scritto:
Hi Ignazio,
I am trying to evacuate the compute host on older version (mitaka). Could please share the process you followed. I am not able to succeed with openstack live-migration fails with error message (this is known issue in older versions) and nova live-ligration - nothing happens even after initiating VM migration. It is almost 4 days.
~Jay.
On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
I am sorry. For simulating an host crash I used a wrong procedure. Using "echo 'c' > /proc/sysrq-trigger" all work fine
Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < ignaziocassano@gmail.com> ha scritto:
Hello All, on ocata when I poweroff a node with active instance , doing a nova host-evacuate works fine and instances are restartd on an active node. On queens it does non evacuate instances but nova-api reports for each instance the following:
2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is in task_state powering-off
So it poweroff all instance on the failed node but does not start them on active nodes
What is changed ? Ignazio
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Jay, for recovering vm state use the command nova reset-state.... nova help reset-state to check the command requested parameters. Ad far as evacuation la concerned, how many compute nodes do gli have ? Instance live migration works? Are gli using shared cinder storage? Ignazio Il Gio 11 Lug 2019 20:51 Jay See <jayachander.it@gmail.com> ha scritto:
Thanks for explanation Ignazio.
I have tried same same by trying to put the compute node on a failure (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not able connect to it. All the VMs are now in Error state.
Running the host-evacaute was successful on controller node, but now I am not able to use the VMs. Because they are all in error state now.
root@h004:~$ nova host-evacuate h017
+--------------------------------------+-------------------+---------------+ | Server UUID | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+ | f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True | | | 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True | | | abe7075b-ac22-4168-bf3d-d302ba37d80e | True | | | c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True | | | ffd983bb-851e-4314-9d1d-375303c278f3 | True | |
+--------------------------------------+-------------------+---------------+
Now I have restarted the compute node manually , now I am able to connect to the compute node but VMs are still in Error state. 1. Any ideas, how to recover the VMs? 2. Are there any other methods to evacuate, as this method seems to be not working in mitaka version.
~Jay.
On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Ok Jay, let me to describe my environment. I have an openstack made up of 3 controllers nodes ad several compute nodes. The controller nodes services are controlled by pacemaker and the compute nodes services are controlled by remote pacemaker. My hardware is Dell so I am using ipmi fencing device . I wrote a service controlled by pacemaker: this service controls if a compude node fails and for avoiding split brains if a compute node does nod respond on the management network and on storage network the stonith poweroff the node and then execute a nova host-evacuate.
Anycase to have a simulation before writing the service I described above you can do as follows:
connect on one compute node where some virtual machines are running run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately the node like in case of failure) On a controller node run: nova host-evacuate "name of failed compute node" Instances running on the failed compute node should be restarted on another compute node
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:57 Jay See < jayachander.it@gmail.com> ha scritto:
Hi ,
I have tried on a failed compute node which is in power off state now. I have tried on a running compute node, no errors. But nothing happens. On running compute node - Disabled the compute service and tried migration also.
May be I might have not followed proper steps. Just wanted to know the steps you have followed. Otherwise, I was planning to manual migration also if possible. ~Jay.
On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
Hi Jay, would you like to evacuate a failed compute node or evacuate a running compute node ?
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:48 Jay See < jayachander.it@gmail.com> ha scritto:
Hi Ignazio,
I am trying to evacuate the compute host on older version (mitaka). Could please share the process you followed. I am not able to succeed with openstack live-migration fails with error message (this is known issue in older versions) and nova live-ligration - nothing happens even after initiating VM migration. It is almost 4 days.
~Jay.
On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
I am sorry. For simulating an host crash I used a wrong procedure. Using "echo 'c' > /proc/sysrq-trigger" all work fine
Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < ignaziocassano@gmail.com> ha scritto:
> Hello All, > on ocata when I poweroff a node with active instance , doing a nova > host-evacuate works fine > and instances are restartd on an active node. > On queens it does non evacuate instances but nova-api reports for > each instance the following: > > 2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi > [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 > c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: > Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is > in task_state powering-off > > So it poweroff all instance on the failed node but does not start > them on active nodes > > What is changed ? > Ignazio > > >
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Ignazio, One instance is stuck in error state not able to recover it. All other instances are running now. root@h004:~$ nova reset-state --all-tenants my-instance-1-2 Reset state for server my-instance-1-2 succeeded; new state is error I have several compute nodes (14). I am not sure what is gli? Live migration is not working, i have tried it was not throwing any errors. But nothing seems to happen. I am not completely sure, I haven't heard about gli before. (This setup is deployed by someone else). ~Jay. On Fri, Jul 12, 2019 at 6:12 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Jay, for recovering vm state use the command nova reset-state....
nova help reset-state to check the command requested parameters.
Ad far as evacuation la concerned, how many compute nodes do gli have ? Instance live migration works? Are gli using shared cinder storage? Ignazio
Il Gio 11 Lug 2019 20:51 Jay See <jayachander.it@gmail.com> ha scritto:
Thanks for explanation Ignazio.
I have tried same same by trying to put the compute node on a failure (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not able connect to it. All the VMs are now in Error state.
Running the host-evacaute was successful on controller node, but now I am not able to use the VMs. Because they are all in error state now.
root@h004:~$ nova host-evacuate h017
+--------------------------------------+-------------------+---------------+ | Server UUID | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+ | f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True | | | 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True | | | abe7075b-ac22-4168-bf3d-d302ba37d80e | True | | | c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True | | | ffd983bb-851e-4314-9d1d-375303c278f3 | True | |
+--------------------------------------+-------------------+---------------+
Now I have restarted the compute node manually , now I am able to connect to the compute node but VMs are still in Error state. 1. Any ideas, how to recover the VMs? 2. Are there any other methods to evacuate, as this method seems to be not working in mitaka version.
~Jay.
On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Ok Jay, let me to describe my environment. I have an openstack made up of 3 controllers nodes ad several compute nodes. The controller nodes services are controlled by pacemaker and the compute nodes services are controlled by remote pacemaker. My hardware is Dell so I am using ipmi fencing device . I wrote a service controlled by pacemaker: this service controls if a compude node fails and for avoiding split brains if a compute node does nod respond on the management network and on storage network the stonith poweroff the node and then execute a nova host-evacuate.
Anycase to have a simulation before writing the service I described above you can do as follows:
connect on one compute node where some virtual machines are running run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately the node like in case of failure) On a controller node run: nova host-evacuate "name of failed compute node" Instances running on the failed compute node should be restarted on another compute node
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:57 Jay See < jayachander.it@gmail.com> ha scritto:
Hi ,
I have tried on a failed compute node which is in power off state now. I have tried on a running compute node, no errors. But nothing happens. On running compute node - Disabled the compute service and tried migration also.
May be I might have not followed proper steps. Just wanted to know the steps you have followed. Otherwise, I was planning to manual migration also if possible. ~Jay.
On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
Hi Jay, would you like to evacuate a failed compute node or evacuate a running compute node ?
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:48 Jay See < jayachander.it@gmail.com> ha scritto:
Hi Ignazio,
I am trying to evacuate the compute host on older version (mitaka). Could please share the process you followed. I am not able to succeed with openstack live-migration fails with error message (this is known issue in older versions) and nova live-ligration - nothing happens even after initiating VM migration. It is almost 4 days.
~Jay.
On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
> I am sorry. > For simulating an host crash I used a wrong procedure. > Using "echo 'c' > /proc/sysrq-trigger" all work fine > > Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < > ignaziocassano@gmail.com> ha scritto: > >> Hello All, >> on ocata when I poweroff a node with active instance , doing a >> nova host-evacuate works fine >> and instances are restartd on an active node. >> On queens it does non evacuate instances but nova-api reports for >> each instance the following: >> >> 2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi >> [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 >> c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: >> Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is >> in task_state powering-off >> >> So it poweroff all instance on the failed node but does not start >> them on active nodes >> >> What is changed ? >> Ignazio >> >> >>
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Sorry ...the question was : how many compute nodes do you have ? instead of how many compute nodes do gli have... Anycase; Did you configured cinder ? Il giorno ven 12 lug 2019 alle ore 11:26 Jay See <jayachander.it@gmail.com> ha scritto:
Ignazio,
One instance is stuck in error state not able to recover it. All other instances are running now.
root@h004:~$ nova reset-state --all-tenants my-instance-1-2 Reset state for server my-instance-1-2 succeeded; new state is error
I have several compute nodes (14). I am not sure what is gli? Live migration is not working, i have tried it was not throwing any errors. But nothing seems to happen. I am not completely sure, I haven't heard about gli before. (This setup is deployed by someone else).
~Jay.
On Fri, Jul 12, 2019 at 6:12 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Jay, for recovering vm state use the command nova reset-state....
nova help reset-state to check the command requested parameters.
Ad far as evacuation la concerned, how many compute nodes do gli have ? Instance live migration works? Are gli using shared cinder storage? Ignazio
Il Gio 11 Lug 2019 20:51 Jay See <jayachander.it@gmail.com> ha scritto:
Thanks for explanation Ignazio.
I have tried same same by trying to put the compute node on a failure (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not able connect to it. All the VMs are now in Error state.
Running the host-evacaute was successful on controller node, but now I am not able to use the VMs. Because they are all in error state now.
root@h004:~$ nova host-evacuate h017
+--------------------------------------+-------------------+---------------+ | Server UUID | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+ | f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True | | | 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True | | | abe7075b-ac22-4168-bf3d-d302ba37d80e | True | | | c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True | | | ffd983bb-851e-4314-9d1d-375303c278f3 | True | |
+--------------------------------------+-------------------+---------------+
Now I have restarted the compute node manually , now I am able to connect to the compute node but VMs are still in Error state. 1. Any ideas, how to recover the VMs? 2. Are there any other methods to evacuate, as this method seems to be not working in mitaka version.
~Jay.
On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
Ok Jay, let me to describe my environment. I have an openstack made up of 3 controllers nodes ad several compute nodes. The controller nodes services are controlled by pacemaker and the compute nodes services are controlled by remote pacemaker. My hardware is Dell so I am using ipmi fencing device . I wrote a service controlled by pacemaker: this service controls if a compude node fails and for avoiding split brains if a compute node does nod respond on the management network and on storage network the stonith poweroff the node and then execute a nova host-evacuate.
Anycase to have a simulation before writing the service I described above you can do as follows:
connect on one compute node where some virtual machines are running run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately the node like in case of failure) On a controller node run: nova host-evacuate "name of failed compute node" Instances running on the failed compute node should be restarted on another compute node
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:57 Jay See < jayachander.it@gmail.com> ha scritto:
Hi ,
I have tried on a failed compute node which is in power off state now. I have tried on a running compute node, no errors. But nothing happens. On running compute node - Disabled the compute service and tried migration also.
May be I might have not followed proper steps. Just wanted to know the steps you have followed. Otherwise, I was planning to manual migration also if possible. ~Jay.
On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
Hi Jay, would you like to evacuate a failed compute node or evacuate a running compute node ?
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:48 Jay See < jayachander.it@gmail.com> ha scritto:
> Hi Ignazio, > > I am trying to evacuate the compute host on older version (mitaka). > Could please share the process you followed. I am not able to > succeed with openstack live-migration fails with error message (this is > known issue in older versions) and nova live-ligration - nothing happens > even after initiating VM migration. It is almost 4 days. > > ~Jay. > > On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano < > ignaziocassano@gmail.com> wrote: > >> I am sorry. >> For simulating an host crash I used a wrong procedure. >> Using "echo 'c' > /proc/sysrq-trigger" all work fine >> >> Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < >> ignaziocassano@gmail.com> ha scritto: >> >>> Hello All, >>> on ocata when I poweroff a node with active instance , doing a >>> nova host-evacuate works fine >>> and instances are restartd on an active node. >>> On queens it does non evacuate instances but nova-api reports for >>> each instance the following: >>> >>> 2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi >>> [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 >>> c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: >>> Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is >>> in task_state powering-off >>> >>> So it poweroff all instance on the failed node but does not start >>> them on active nodes >>> >>> What is changed ? >>> Ignazio >>> >>> >>> > > -- > > P *SAVE PAPER – Please do not print this e-mail unless absolutely > necessary.* >
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Yes, cinder is running. root@h017:~$ service --status-all | grep cinder [ + ] cinder-volume On Fri, Jul 12, 2019 at 11:53 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Sorry ...the question was : how many compute nodes do you have ? instead of how many compute nodes do gli have...
Anycase; Did you configured cinder ?
Il giorno ven 12 lug 2019 alle ore 11:26 Jay See <jayachander.it@gmail.com> ha scritto:
Ignazio,
One instance is stuck in error state not able to recover it. All other instances are running now.
root@h004:~$ nova reset-state --all-tenants my-instance-1-2 Reset state for server my-instance-1-2 succeeded; new state is error
I have several compute nodes (14). I am not sure what is gli? Live migration is not working, i have tried it was not throwing any errors. But nothing seems to happen. I am not completely sure, I haven't heard about gli before. (This setup is deployed by someone else).
~Jay.
On Fri, Jul 12, 2019 at 6:12 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Jay, for recovering vm state use the command nova reset-state....
nova help reset-state to check the command requested parameters.
Ad far as evacuation la concerned, how many compute nodes do gli have ? Instance live migration works? Are gli using shared cinder storage? Ignazio
Il Gio 11 Lug 2019 20:51 Jay See <jayachander.it@gmail.com> ha scritto:
Thanks for explanation Ignazio.
I have tried same same by trying to put the compute node on a failure (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not able connect to it. All the VMs are now in Error state.
Running the host-evacaute was successful on controller node, but now I am not able to use the VMs. Because they are all in error state now.
root@h004:~$ nova host-evacuate h017
+--------------------------------------+-------------------+---------------+ | Server UUID | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+ | f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True | | | 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True | | | abe7075b-ac22-4168-bf3d-d302ba37d80e | True | | | c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True | | | ffd983bb-851e-4314-9d1d-375303c278f3 | True | |
+--------------------------------------+-------------------+---------------+
Now I have restarted the compute node manually , now I am able to connect to the compute node but VMs are still in Error state. 1. Any ideas, how to recover the VMs? 2. Are there any other methods to evacuate, as this method seems to be not working in mitaka version.
~Jay.
On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
Ok Jay, let me to describe my environment. I have an openstack made up of 3 controllers nodes ad several compute nodes. The controller nodes services are controlled by pacemaker and the compute nodes services are controlled by remote pacemaker. My hardware is Dell so I am using ipmi fencing device . I wrote a service controlled by pacemaker: this service controls if a compude node fails and for avoiding split brains if a compute node does nod respond on the management network and on storage network the stonith poweroff the node and then execute a nova host-evacuate.
Anycase to have a simulation before writing the service I described above you can do as follows:
connect on one compute node where some virtual machines are running run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately the node like in case of failure) On a controller node run: nova host-evacuate "name of failed compute node" Instances running on the failed compute node should be restarted on another compute node
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:57 Jay See < jayachander.it@gmail.com> ha scritto:
Hi ,
I have tried on a failed compute node which is in power off state now. I have tried on a running compute node, no errors. But nothing happens. On running compute node - Disabled the compute service and tried migration also.
May be I might have not followed proper steps. Just wanted to know the steps you have followed. Otherwise, I was planning to manual migration also if possible. ~Jay.
On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
> Hi Jay, > would you like to evacuate a failed compute node or evacuate a > running compute node ? > > Ignazio > > Il giorno gio 11 lug 2019 alle ore 11:48 Jay See < > jayachander.it@gmail.com> ha scritto: > >> Hi Ignazio, >> >> I am trying to evacuate the compute host on older version (mitaka). >> Could please share the process you followed. I am not able to >> succeed with openstack live-migration fails with error message (this is >> known issue in older versions) and nova live-ligration - nothing happens >> even after initiating VM migration. It is almost 4 days. >> >> ~Jay. >> >> On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano < >> ignaziocassano@gmail.com> wrote: >> >>> I am sorry. >>> For simulating an host crash I used a wrong procedure. >>> Using "echo 'c' > /proc/sysrq-trigger" all work fine >>> >>> Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < >>> ignaziocassano@gmail.com> ha scritto: >>> >>>> Hello All, >>>> on ocata when I poweroff a node with active instance , doing a >>>> nova host-evacuate works fine >>>> and instances are restartd on an active node. >>>> On queens it does non evacuate instances but nova-api reports for >>>> each instance the following: >>>> >>>> 2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi >>>> [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 >>>> c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: >>>> Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is >>>> in task_state powering-off >>>> >>>> So it poweroff all instance on the failed node but does not start >>>> them on active nodes >>>> >>>> What is changed ? >>>> Ignazio >>>> >>>> >>>> >> >> -- >> >> P *SAVE PAPER – Please do not print this e-mail unless absolutely >> necessary.* >> >
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
Ok. But your virtual machine are using a root volume on cinder or are ephemeral ? Anycase when you try a live migration, look at nova compute log on the kvm node where the instance is migrate from Il giorno ven 12 lug 2019 alle ore 12:48 Jay See <jayachander.it@gmail.com> ha scritto:
Yes, cinder is running.
root@h017:~$ service --status-all | grep cinder [ + ] cinder-volume
On Fri, Jul 12, 2019 at 11:53 AM Ignazio Cassano <ignaziocassano@gmail.com> wrote:
Sorry ...the question was : how many compute nodes do you have ? instead of how many compute nodes do gli have...
Anycase; Did you configured cinder ?
Il giorno ven 12 lug 2019 alle ore 11:26 Jay See < jayachander.it@gmail.com> ha scritto:
Ignazio,
One instance is stuck in error state not able to recover it. All other instances are running now.
root@h004:~$ nova reset-state --all-tenants my-instance-1-2 Reset state for server my-instance-1-2 succeeded; new state is error
I have several compute nodes (14). I am not sure what is gli? Live migration is not working, i have tried it was not throwing any errors. But nothing seems to happen. I am not completely sure, I haven't heard about gli before. (This setup is deployed by someone else).
~Jay.
On Fri, Jul 12, 2019 at 6:12 AM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
Jay, for recovering vm state use the command nova reset-state....
nova help reset-state to check the command requested parameters.
Ad far as evacuation la concerned, how many compute nodes do gli have ? Instance live migration works? Are gli using shared cinder storage? Ignazio
Il Gio 11 Lug 2019 20:51 Jay See <jayachander.it@gmail.com> ha scritto:
Thanks for explanation Ignazio.
I have tried same same by trying to put the compute node on a failure (echo 'c' > /proc/sysrq-trigger ). Compute node was stuck and I was not able connect to it. All the VMs are now in Error state.
Running the host-evacaute was successful on controller node, but now I am not able to use the VMs. Because they are all in error state now.
root@h004:~$ nova host-evacuate h017
+--------------------------------------+-------------------+---------------+ | Server UUID | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+ | f3545f7d-b85e-49ee-b407-333a4c5b5ab9 | True | | | 9094494b-cfa3-459b-8d51-d9aae0ea9636 | True | | | abe7075b-ac22-4168-bf3d-d302ba37d80e | True | | | c9919371-5f2e-4155-a01a-5f41d9c8b0e7 | True | | | ffd983bb-851e-4314-9d1d-375303c278f3 | True | |
+--------------------------------------+-------------------+---------------+
Now I have restarted the compute node manually , now I am able to connect to the compute node but VMs are still in Error state. 1. Any ideas, how to recover the VMs? 2. Are there any other methods to evacuate, as this method seems to be not working in mitaka version.
~Jay.
On Thu, Jul 11, 2019 at 1:33 PM Ignazio Cassano < ignaziocassano@gmail.com> wrote:
Ok Jay, let me to describe my environment. I have an openstack made up of 3 controllers nodes ad several compute nodes. The controller nodes services are controlled by pacemaker and the compute nodes services are controlled by remote pacemaker. My hardware is Dell so I am using ipmi fencing device . I wrote a service controlled by pacemaker: this service controls if a compude node fails and for avoiding split brains if a compute node does nod respond on the management network and on storage network the stonith poweroff the node and then execute a nova host-evacuate.
Anycase to have a simulation before writing the service I described above you can do as follows:
connect on one compute node where some virtual machines are running run the command: echo 'c' > /proc/sysrq-trigger (it stops immediately the node like in case of failure) On a controller node run: nova host-evacuate "name of failed compute node" Instances running on the failed compute node should be restarted on another compute node
Ignazio
Il giorno gio 11 lug 2019 alle ore 11:57 Jay See < jayachander.it@gmail.com> ha scritto:
> Hi , > > I have tried on a failed compute node which is in power off state > now. > I have tried on a running compute node, no errors. But > nothing happens. > On running compute node - Disabled the compute service and tried > migration also. > > May be I might have not followed proper steps. Just wanted to know > the steps you have followed. Otherwise, I was planning to manual migration > also if possible. > ~Jay. > > On Thu, Jul 11, 2019 at 11:52 AM Ignazio Cassano < > ignaziocassano@gmail.com> wrote: > >> Hi Jay, >> would you like to evacuate a failed compute node or evacuate a >> running compute node ? >> >> Ignazio >> >> Il giorno gio 11 lug 2019 alle ore 11:48 Jay See < >> jayachander.it@gmail.com> ha scritto: >> >>> Hi Ignazio, >>> >>> I am trying to evacuate the compute host on older version (mitaka). >>> Could please share the process you followed. I am not able to >>> succeed with openstack live-migration fails with error message (this is >>> known issue in older versions) and nova live-ligration - nothing happens >>> even after initiating VM migration. It is almost 4 days. >>> >>> ~Jay. >>> >>> On Thu, Jul 11, 2019 at 11:31 AM Ignazio Cassano < >>> ignaziocassano@gmail.com> wrote: >>> >>>> I am sorry. >>>> For simulating an host crash I used a wrong procedure. >>>> Using "echo 'c' > /proc/sysrq-trigger" all work fine >>>> >>>> Il giorno gio 11 lug 2019 alle ore 11:01 Ignazio Cassano < >>>> ignaziocassano@gmail.com> ha scritto: >>>> >>>>> Hello All, >>>>> on ocata when I poweroff a node with active instance , doing a >>>>> nova host-evacuate works fine >>>>> and instances are restartd on an active node. >>>>> On queens it does non evacuate instances but nova-api reports >>>>> for each instance the following: >>>>> >>>>> 2019-07-11 10:19:54.745 13811 INFO nova.api.openstack.wsgi >>>>> [req-daad0a7d-87ce-41bf-b096-a70fc306db5c 0c7a2d6006614fe2b3e81e47377dd2a9 >>>>> c26f8d35f85547c4add392a221af1aab - default default] HTTP exception thrown: >>>>> Cannot 'evacuate' instance e8485a5e-3623-4184-bcce-cafd56fa60b3 while it is >>>>> in task_state powering-off >>>>> >>>>> So it poweroff all instance on the failed node but does not >>>>> start them on active nodes >>>>> >>>>> What is changed ? >>>>> Ignazio >>>>> >>>>> >>>>> >>> >>> -- >>> >>> P *SAVE PAPER – Please do not print this e-mail unless >>> absolutely necessary.* >>> >> > > -- > > P *SAVE PAPER – Please do not print this e-mail unless absolutely > necessary.* >
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
-- P *SAVE PAPER – Please do not print this e-mail unless absolutely necessary.*
participants (2)
-
Ignazio Cassano
-
Jay See