Live migration fails

Sean Mooney smooney at redhat.com
Tue Apr 27 10:53:41 UTC 2021


On Tue, 2021-04-27 at 09:17 +0000, Szabo, Istvan (Agoda) wrote:
> Hi,
> 
> We are trying to live migrate instances out from compute nodes and tries to automate but seems like can't really do, when the migration stuck. 
> Let me explain the issue a bit:
> 
> 1. We initiate live migration
> 2. live migration finished, the machine disappeared from the /var/lib/nova/instances/<server id> directory on the source server.
> 3. but when I query or see in horizon it stucked in migrating phase. We collected information like migration id and we try to force it but it is already finished, and can't force to complete.
> 4. I've restarted the nova service on the source node, it just make the machine to error phase, and the force not working also.
> 5. I changed the state from error to active but that one also can't force complete.
> 
> What can I do to change the name of the compute node in the DB?
> 

you should not change the name of the compute node in the db.
we do not support changing the compute node name if it has instances on it.
if you ment in the migration record you also should not change it as the resouces woudl not be claimed correctly.

>  How can I force it without touching the db?
> 
i dont think you can fix it without touching the db.

so if the vm is removed form the source node there are 2 things you chould check
1 is the instance.host set to the dest host where it is now running
2 if you look in the logs was there an error in post live migrate.

baiscaly what i think was the most likely issue is that an operation in post live migrate failed before the migations recored was set to complete.

the precondiotns for force complete are
The server OS-EXT-STS:vm_state value must be active and the server OS-EXT-STS:task_state value must be migrating.
https://docs.openstack.org/api-ref/compute/?expanded=force-migration-complete-action-force-complete-action-detail#force-migration-complete-action-force-complete-action

if the instance.host matches the host on which it is now rungin then you should be able to set the status and taskstate back to active/migrating
respectivly. at which point you can force complete the migration.

if the vm is running correctly on the destiatnion host and its host and the instance.host is set correctly it might just be simpler to updte the
migration record to complete and ensure the task state is set to none on the instance.

if the instace.host still has the source host but its running on the dest host then you should update it to refelct the correct host then mark the
migration as complete.

all of the above will require at least some db modifcations.

>  

> 
> The goal is to automate the compute node draining as less as possible user intervention. 
> 
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo at agoda.com
> ---------------------------------------------------
> 
> -----Original Message-----
> From: Szabo, Istvan (Agoda) <Istvan.Szabo at agoda.com> 
> Sent: Friday, April 23, 2021 9:13 AM
> To: Sean Mooney <smooney at redhat.com>; openstack-discuss at lists.openstack.org
> Subject: RE: Live migration fails
> 
> My /etc/hostname has only short name.
> The nova.conf host value is also short name.
> The host has been selected by the scheduler: nova live-migration --block-migrate 1517a2ac-3b51-4d8d-80b3-89a5614d1ae0
> 
> What has been changed is in the instances table in the nova DB the node field of the vm. So actually I don't change the compute host value just edited the VM value actually.
> 
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo at agoda.com
> ---------------------------------------------------
> 
> -----Original Message-----
> From: Sean Mooney <smooney at redhat.com>
> Sent: Thursday, April 22, 2021 4:13 PM
> To: openstack-discuss at lists.openstack.org
> Subject: Re: Live migration fails
> 
> On Thu, 2021-04-22 at 06:01 +0000, Eugen Block wrote:
> > Yeah, the column "node" has the FQDN in my DB, too, only "host" is the 
> > short name. The question is how did the short name get into the "node"
> > column, but it will probably be difficult to get to the bottom of that.
> well by default we do not expect to have FQDNs in either filed.
> novas default  for both is the hostname of the host which will be the short name not the fqdn unless you set an fqdn in /etc/hostname which is not generally the recommended pratice.
> 
> nova in general does nto support changing the hostname(/etc/hostname) of a host and you should avoid changeing the "host" value in the nova.conf too.
> 
> changing these values can result in the creation fo addtional placment RP, compute service records and compute nodes and that can result in hard to fix situation wehre old vms are using one set of resouce and new vms are using the updated ones.
> 
> so you should not modify either value in the db.
> 
> did you perhaps specify a host when live migrating and just pass the wrong value or was the host selected by the scheduler.
> > 
> > 
> > Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo at agoda.com>:
> > 
> > > I think I found the issue, in the instances nova db in the node 
> > > column the compute node name somehow changed to short hostname. It 
> > > works fith FQDN but it doesn't work with short ☹ I hope I didn't 
> > > mess-up anything if I change to FQDN to make it work.
> > > 
> > > Istvan Szabo
> > > Senior Infrastructure Engineer
> > > ---------------------------------------------------
> > > Agoda Services Co., Ltd.
> > > e: istvan.szabo at agoda.com
> > > ---------------------------------------------------
> > > 
> > > -----Original Message-----
> > > From: Szabo, Istvan (Agoda) <Istvan.Szabo at agoda.com>
> > > Sent: Thursday, April 22, 2021 11:19 AM
> > > To: Eugen Block <eblock at nde.ag>
> > > Cc: openstack-discuss at lists.openstack.org
> > > Subject: RE: Live migration fails
> > > 
> > > Sorry, in the log I haven't commented out the servername ☹ it is
> > > xy-osfecn-40250
> > > 
> > > Istvan Szabo
> > > Senior Infrastructure Engineer
> > > ---------------------------------------------------
> > > Agoda Services Co., Ltd.
> > > e: istvan.szabo at agoda.com
> > > ---------------------------------------------------
> > > 
> > > -----Original Message-----
> > > From: Eugen Block <eblock at nde.ag>
> > > Sent: Wednesday, April 21, 2021 5:37 PM
> > > To: Szabo, Istvan (Agoda) <Istvan.Szabo at agoda.com>
> > > Cc: openstack-discuss at lists.openstack.org
> > > Subject: Re: Live migration fails
> > > 
> > > The error message seems correct, I can't find am-osfecn-4025 either 
> > > in the list of compute nodes. Can you check in the database if 
> > > there's an active instance (or several) allocated to that compute 
> > > node? In that case you would need to correct the allocation in order 
> > > for the migration to work.
> > > 
> > > 
> > > Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo at agoda.com>:
> > > 
> > > > Sure:
> > > > 
> > > > https://jpst.it/2u3uh
> > > > 
> > > > These are the one where can't live migrate:
> > > > xy-osfecn-40250
> > > > xy-osfecn-40281
> > > > xy-osfecn-40290
> > > > xy-osbecn-40073
> > > > xy-osfecn-40238
> > > > 
> > > > The compute service are disabled on these because we don't want 
> > > > anybody spawn a vm on these anymore so want to evacuate all vms.
> > > > 
> > > > Istvan Szabo
> > > > Senior Infrastructure Engineer
> > > > ---------------------------------------------------
> > > > Agoda Services Co., Ltd.
> > > > e: istvan.szabo at agoda.com
> > > > ---------------------------------------------------
> > > > 
> > > > -----Original Message-----
> > > > From: Eugen Block <eblock at nde.ag>
> > > > Sent: Wednesday, April 21, 2021 3:26 PM
> > > > To: openstack-discuss at lists.openstack.org
> > > > Subject: Re: Live migration fails
> > > > 
> > > > Hi,
> > > > 
> > > > can you share the output of these commands?
> > > > 
> > > > nova-manage cell_v2 list_hosts
> > > > openstack compute service list
> > > > 
> > > > 
> > > > Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo at agoda.com>:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > I have couple of compute nodes where the live migration fails 
> > > > > with existing vms.
> > > > > When I quickly spawn a vm and try live migration it works so I 
> > > > > assume shouldn't be a big problem with the compute node.
> > > > > However I have many existing vms where it fails with a 
> > > > > servername not found.
> > > > > 
> > > > > /var/log/nova/nova-conductor.log:2021-04-21 14:47:12.605 227612 
> > > > > ERROR nova.conductor.tasks.migrate
> > > > > [req-f4067a26-a233-4673-8c07-9a8a290980b0
> > > > > dce35e6eceea4312bb0baa0510cef363 
> > > > > ca7e35079f4440c78bd9870724b9638b - default default] [instance:
> > > > > 1517a2ac-3b51-4d8d-80b3-89a5614d1ae0]
> > > > > Unable to find record for source node servername on servername:
> > > > > ComputeHostNotFound: Compute host servername could not be found.
> > > > > /var/log/nova/nova-conductor.log:2021-04-21 14:47:12.605 227612 
> > > > > WARNING nova.scheduler.utils
> > > > > [req-f4067a26-a233-4673-8c07-9a8a290980b0
> > > > > dce35e6eceea4312bb0baa0510cef363 
> > > > > ca7e35079f4440c78bd9870724b9638b - default default] Failed to
> > > > > compute_task_migrate_server: Compute host servername could not 
> > > > > be found.: ComputeHostNotFound: Compute host servername could not be found.
> > > > > /var/log/nova/nova-conductor.log:2021-04-21 14:47:12.605 227612 
> > > > > WARNING nova.scheduler.utils
> > > > > [req-f4067a26-a233-4673-8c07-9a8a290980b0
> > > > > dce35e6eceea4312bb0baa0510cef363 
> > > > > ca7e35079f4440c78bd9870724b9638b - default default] [instance:
> > > > > 1517a2ac-3b51-4d8d-80b3-89a5614d1ae0]
> > > > > Setting instance to ACTIVE state.: ComputeHostNotFound: Compute 
> > > > > host servername could not be found.
> > > > > /var/log/nova/nova-conductor.log:2021-04-21 14:47:12.672 227612 
> > > > > ERROR oslo_messaging.rpc.server
> > > > > [req-f4067a26-a233-4673-8c07-9a8a290980b0
> > > > > dce35e6eceea4312bb0baa0510cef363 
> > > > > ca7e35079f4440c78bd9870724b9638b - default default] Exception during message handling:
> > > > > ComputeHostNotFound: Compute host am-osfecn-4025
> > > > > 
> > > > > Tried with this command:
> > > > > 
> > > > > nova live-migration --block-migrate id.
> > > > > 
> > > > > Any idea?
> > > > > 
> > > > > Thank you.
> > > > > 
> > > > > ________________________________ This message is confidential 
> > > > > and is for the sole use of the intended recipient(s). It may 
> > > > > also be privileged or otherwise protected by copyright or other 
> > > > > legal rules. If you have received it by mistake please let us 
> > > > > know by reply email and delete it from your system. It is 
> > > > > prohibited to copy this message or disclose its content to anyone.
> > > > > Any confidentiality or privilege is not waived or lost by any 
> > > > > mistaken delivery or unauthorized disclosure of the message. All 
> > > > > messages sent to and from Agoda may be monitored to ensure 
> > > > > compliance with company policies, to protect the company's 
> > > > > interests and to remove potential malware. Electronic messages 
> > > > > may be intercepted, amended, lost or deleted, or contain viruses.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > ________________________________
> > > > This message is confidential and is for the sole use of the 
> > > > intended recipient(s). It may also be privileged or otherwise 
> > > > protected by copyright or other legal rules. If you have received 
> > > > it by mistake please let us know by reply email and delete it from 
> > > > your system. It is prohibited to copy this message or disclose its content to anyone.
> > > > Any confidentiality or privilege is not waived or lost by any 
> > > > mistaken delivery or unauthorized disclosure of the message. All 
> > > > messages sent to and from Agoda may be monitored to ensure 
> > > > compliance with company policies, to protect the company's 
> > > > interests and to remove potential malware. Electronic messages may 
> > > > be intercepted, amended, lost or deleted, or contain viruses.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > ________________________________
> > > This message is confidential and is for the sole use of the intended 
> > > recipient(s). It may also be privileged or otherwise protected by 
> > > copyright or other legal rules. If you have received it by mistake 
> > > please let us know by reply email and delete it from your system. It 
> > > is prohibited to copy this message or disclose its content to 
> > > anyone. Any confidentiality or privilege is not waived or lost by 
> > > any mistaken delivery or unauthorized disclosure of the message. All 
> > > messages sent to and from Agoda may be monitored to ensure 
> > > compliance with company policies, to protect the company's interests 
> > > and to remove potential malware. Electronic messages may be 
> > > intercepted, amended, lost or deleted, or contain viruses.
> > 
> > 
> > 
> > 
> 
> 
> 
> 
> ________________________________
> This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.





More information about the openstack-discuss mailing list