Live migration fails

Szabo, Istvan (Agoda) Istvan.Szabo at agoda.com
Tue Apr 27 09:17:34 UTC 2021


Hi,

We are trying to live migrate instances out from compute nodes and tries to automate but seems like can't really do, when the migration stuck. 
Let me explain the issue a bit:

1. We initiate live migration
2. live migration finished, the machine disappeared from the /var/lib/nova/instances/<server id> directory on the source server.
3. but when I query or see in horizon it stucked in migrating phase. We collected information like migration id and we try to force it but it is already finished, and can't force to complete.
4. I've restarted the nova service on the source node, it just make the machine to error phase, and the force not working also.
5. I changed the state from error to active but that one also can't force complete.

What can I do to change the name of the compute node in the DB? How can I force it without touching the db? 

The goal is to automate the compute node draining as less as possible user intervention. 

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo at agoda.com
---------------------------------------------------

-----Original Message-----
From: Szabo, Istvan (Agoda) <Istvan.Szabo at agoda.com> 
Sent: Friday, April 23, 2021 9:13 AM
To: Sean Mooney <smooney at redhat.com>; openstack-discuss at lists.openstack.org
Subject: RE: Live migration fails

My /etc/hostname has only short name.
The nova.conf host value is also short name.
The host has been selected by the scheduler: nova live-migration --block-migrate 1517a2ac-3b51-4d8d-80b3-89a5614d1ae0

What has been changed is in the instances table in the nova DB the node field of the vm. So actually I don't change the compute host value just edited the VM value actually.

Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo at agoda.com
---------------------------------------------------

-----Original Message-----
From: Sean Mooney <smooney at redhat.com>
Sent: Thursday, April 22, 2021 4:13 PM
To: openstack-discuss at lists.openstack.org
Subject: Re: Live migration fails

On Thu, 2021-04-22 at 06:01 +0000, Eugen Block wrote:
> Yeah, the column "node" has the FQDN in my DB, too, only "host" is the 
> short name. The question is how did the short name get into the "node"
> column, but it will probably be difficult to get to the bottom of that.
well by default we do not expect to have FQDNs in either filed.
novas default  for both is the hostname of the host which will be the short name not the fqdn unless you set an fqdn in /etc/hostname which is not generally the recommended pratice.

nova in general does nto support changing the hostname(/etc/hostname) of a host and you should avoid changeing the "host" value in the nova.conf too.

changing these values can result in the creation fo addtional placment RP, compute service records and compute nodes and that can result in hard to fix situation wehre old vms are using one set of resouce and new vms are using the updated ones.

so you should not modify either value in the db.

did you perhaps specify a host when live migrating and just pass the wrong value or was the host selected by the scheduler.
>
>
> Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo at agoda.com>:
>
> > I think I found the issue, in the instances nova db in the node 
> > column the compute node name somehow changed to short hostname. It 
> > works fith FQDN but it doesn't work with short ☹ I hope I didn't 
> > mess-up anything if I change to FQDN to make it work.
> >
> > Istvan Szabo
> > Senior Infrastructure Engineer
> > ---------------------------------------------------
> > Agoda Services Co., Ltd.
> > e: istvan.szabo at agoda.com
> > ---------------------------------------------------
> >
> > -----Original Message-----
> > From: Szabo, Istvan (Agoda) <Istvan.Szabo at agoda.com>
> > Sent: Thursday, April 22, 2021 11:19 AM
> > To: Eugen Block <eblock at nde.ag>
> > Cc: openstack-discuss at lists.openstack.org
> > Subject: RE: Live migration fails
> >
> > Sorry, in the log I haven't commented out the servername ☹ it is
> > xy-osfecn-40250
> >
> > Istvan Szabo
> > Senior Infrastructure Engineer
> > ---------------------------------------------------
> > Agoda Services Co., Ltd.
> > e: istvan.szabo at agoda.com
> > ---------------------------------------------------
> >
> > -----Original Message-----
> > From: Eugen Block <eblock at nde.ag>
> > Sent: Wednesday, April 21, 2021 5:37 PM
> > To: Szabo, Istvan (Agoda) <Istvan.Szabo at agoda.com>
> > Cc: openstack-discuss at lists.openstack.org
> > Subject: Re: Live migration fails
> >
> > The error message seems correct, I can't find am-osfecn-4025 either 
> > in the list of compute nodes. Can you check in the database if 
> > there's an active instance (or several) allocated to that compute 
> > node? In that case you would need to correct the allocation in order 
> > for the migration to work.
> >
> >
> > Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo at agoda.com>:
> >
> > > Sure:
> > >
> > > https://jpst.it/2u3uh
> > >
> > > These are the one where can't live migrate:
> > > xy-osfecn-40250
> > > xy-osfecn-40281
> > > xy-osfecn-40290
> > > xy-osbecn-40073
> > > xy-osfecn-40238
> > >
> > > The compute service are disabled on these because we don't want 
> > > anybody spawn a vm on these anymore so want to evacuate all vms.
> > >
> > > Istvan Szabo
> > > Senior Infrastructure Engineer
> > > ---------------------------------------------------
> > > Agoda Services Co., Ltd.
> > > e: istvan.szabo at agoda.com
> > > ---------------------------------------------------
> > >
> > > -----Original Message-----
> > > From: Eugen Block <eblock at nde.ag>
> > > Sent: Wednesday, April 21, 2021 3:26 PM
> > > To: openstack-discuss at lists.openstack.org
> > > Subject: Re: Live migration fails
> > >
> > > Hi,
> > >
> > > can you share the output of these commands?
> > >
> > > nova-manage cell_v2 list_hosts
> > > openstack compute service list
> > >
> > >
> > > Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo at agoda.com>:
> > >
> > > > Hi,
> > > >
> > > > I have couple of compute nodes where the live migration fails 
> > > > with existing vms.
> > > > When I quickly spawn a vm and try live migration it works so I 
> > > > assume shouldn't be a big problem with the compute node.
> > > > However I have many existing vms where it fails with a 
> > > > servername not found.
> > > >
> > > > /var/log/nova/nova-conductor.log:2021-04-21 14:47:12.605 227612 
> > > > ERROR nova.conductor.tasks.migrate
> > > > [req-f4067a26-a233-4673-8c07-9a8a290980b0
> > > > dce35e6eceea4312bb0baa0510cef363 
> > > > ca7e35079f4440c78bd9870724b9638b - default default] [instance:
> > > > 1517a2ac-3b51-4d8d-80b3-89a5614d1ae0]
> > > > Unable to find record for source node servername on servername:
> > > > ComputeHostNotFound: Compute host servername could not be found.
> > > > /var/log/nova/nova-conductor.log:2021-04-21 14:47:12.605 227612 
> > > > WARNING nova.scheduler.utils
> > > > [req-f4067a26-a233-4673-8c07-9a8a290980b0
> > > > dce35e6eceea4312bb0baa0510cef363 
> > > > ca7e35079f4440c78bd9870724b9638b - default default] Failed to
> > > > compute_task_migrate_server: Compute host servername could not 
> > > > be found.: ComputeHostNotFound: Compute host servername could not be found.
> > > > /var/log/nova/nova-conductor.log:2021-04-21 14:47:12.605 227612 
> > > > WARNING nova.scheduler.utils
> > > > [req-f4067a26-a233-4673-8c07-9a8a290980b0
> > > > dce35e6eceea4312bb0baa0510cef363 
> > > > ca7e35079f4440c78bd9870724b9638b - default default] [instance:
> > > > 1517a2ac-3b51-4d8d-80b3-89a5614d1ae0]
> > > > Setting instance to ACTIVE state.: ComputeHostNotFound: Compute 
> > > > host servername could not be found.
> > > > /var/log/nova/nova-conductor.log:2021-04-21 14:47:12.672 227612 
> > > > ERROR oslo_messaging.rpc.server
> > > > [req-f4067a26-a233-4673-8c07-9a8a290980b0
> > > > dce35e6eceea4312bb0baa0510cef363 
> > > > ca7e35079f4440c78bd9870724b9638b - default default] Exception during message handling:
> > > > ComputeHostNotFound: Compute host am-osfecn-4025
> > > >
> > > > Tried with this command:
> > > >
> > > > nova live-migration --block-migrate id.
> > > >
> > > > Any idea?
> > > >
> > > > Thank you.
> > > >
> > > > ________________________________ This message is confidential 
> > > > and is for the sole use of the intended recipient(s). It may 
> > > > also be privileged or otherwise protected by copyright or other 
> > > > legal rules. If you have received it by mistake please let us 
> > > > know by reply email and delete it from your system. It is 
> > > > prohibited to copy this message or disclose its content to anyone.
> > > > Any confidentiality or privilege is not waived or lost by any 
> > > > mistaken delivery or unauthorized disclosure of the message. All 
> > > > messages sent to and from Agoda may be monitored to ensure 
> > > > compliance with company policies, to protect the company's 
> > > > interests and to remove potential malware. Electronic messages 
> > > > may be intercepted, amended, lost or deleted, or contain viruses.
> > >
> > >
> > >
> > >
> > >
> > > ________________________________
> > > This message is confidential and is for the sole use of the 
> > > intended recipient(s). It may also be privileged or otherwise 
> > > protected by copyright or other legal rules. If you have received 
> > > it by mistake please let us know by reply email and delete it from 
> > > your system. It is prohibited to copy this message or disclose its content to anyone.
> > > Any confidentiality or privilege is not waived or lost by any 
> > > mistaken delivery or unauthorized disclosure of the message. All 
> > > messages sent to and from Agoda may be monitored to ensure 
> > > compliance with company policies, to protect the company's 
> > > interests and to remove potential malware. Electronic messages may 
> > > be intercepted, amended, lost or deleted, or contain viruses.
> >
> >
> >
> >
> >
> > ________________________________
> > This message is confidential and is for the sole use of the intended 
> > recipient(s). It may also be privileged or otherwise protected by 
> > copyright or other legal rules. If you have received it by mistake 
> > please let us know by reply email and delete it from your system. It 
> > is prohibited to copy this message or disclose its content to 
> > anyone. Any confidentiality or privilege is not waived or lost by 
> > any mistaken delivery or unauthorized disclosure of the message. All 
> > messages sent to and from Agoda may be monitored to ensure 
> > compliance with company policies, to protect the company's interests 
> > and to remove potential malware. Electronic messages may be 
> > intercepted, amended, lost or deleted, or contain viruses.
>
>
>
>




________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.


More information about the openstack-discuss mailing list