How to debug silent live migration errors
Lee Yarwood
lyarwood at redhat.com
Wed Mar 31 18:10:29 UTC 2021
Live migration is an asynchronous operation so without --wait on the
command line it returns once the API initially returns 202 to indicate
the request was accepted [1].
As an admin you can use the server migrations API to track the status
of the migration [2] via openstackclient:
$ openstack server migration list --server $instance_uuid
$ openstack server migration show $instance_uuid $migration_id
You also have the event list so you can find the specific request-id
associated with the live migration and trace that through your logs:
$ openstack server event list $instance_uuid
$ openstack server event show $instance_uuid $request-id
Hope that helps,
Lee
[1] https://docs.openstack.org/api-ref/compute/?expanded=live-migrate-server-os-migratelive-action-detail#live-migrate-server-os-migratelive-action
[2] https://docs.openstack.org/api-ref/compute/?expanded=show-migration-details-detail#show-migration-details
[3] https://docs.openstack.org/api-guide/compute/faults.html
On Tue, 30 Mar 2021 at 14:33, Linebarger, John <jmlineb at sandia.gov> wrote:
>
> How would I debug silent (or mostly silent) live migration errors? We’re using the Stein release of Canonical’s Charmed OpenStack. I have configured it for live migration per the instructions at this link:
>
>
>
> https://docs.openstack.org/nova/pike/admin/configuring-migrations.html#section-configuring-compute-migrations
>
>
>
> Specifically:
>
>
>
> 1. I did not specify vncserver_listen=0.0.0.0 in nova.conf because we are not running VNC on our instances
>
> 2. instances_path is /var/lib/nova/instances on all compute nodes
>
> 3. I believe that MAAS is “the sole provider of DHCP and DNS for the network hosting the MAAS cluster”, per https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/install-maas.html
>
> 4. Identical authorized_keys files are present on all compute nodes with keys from all compute nodes by default
>
> 5. I manually configured the firewalls on all compute nodes to allow libvirt to communicate between compute hosts with:
>
> sudo ufw allow 49152:49261/tcp
>
> 6. The following settings are specified in nova.conf on each compute node:
>
> live_migration_downtime = 500
>
> live_migration_downtime_steps = 10
>
> live_migration_downtime_delay = 75
>
> live_migration_permit_post_copy=true
>
>
>
> Here’s what happens when I try to Live Migrate from the Horizon Dashboard:
>
>
>
> 1. As admin, in the Admin à Instances menu, I select the dropdown arrow to the right of the instance. Live Migrate Instance appears (but in black, unlike Migrate Instance, which appears in red). I select Live Migrate Instance, and whether or not I Automatically schedule new host or manually select a new host the Task column says “Migrating” and then it stops and reverts to None. The server never changes. The Action Log shows the live migration request but the Message column is blank.
>
>
>
> 2. I do the very same thing but this time select Disk Over Commit. Same results. Migrating reverts back to None and the server never changes.
>
>
>
> 3. I do the very same thing but this time select Block Migration. This time I do get an error: “Failed to live migrate instance to host ‘AUTO_SCHEDULE’”. And this time the Action Log has “Error” in the Message column.
>
>
>
> Same behavior with the CLI. For example, this CLI command below completes silently, yet the server for the instance never changes.
>
>
>
> openstack server migrate <instanceID> --live <newServerName>
>
> [Silent failure]
>
> openstack server show <instanceID>
>
> [Still running on original server]
>
>
>
> Note that I *can* successfully Migrate, both using the Horizon Dashboard and the CLI. What fails is Live Migration. I just have no idea why, and no error is displayed in the Action Log for the instance.
>
>
>
> For reference, the instance is an m1.small with 2GB of RAM, 1 VCPU, and a 20GB Cinder disk volume attached on /dev/vda.
>
>
>
> Any and all debugging ideas would be most welcome. Without logs I am simply guessing in the dark at this point.
>
>
>
> Thanks! Enjoy!
>
>
>
> John M. Linebarger, PhD, MBA
>
> Principal Member of Technical Staff
>
> Sandia National Laboratories
>
> (Office) 505-845-8282
More information about the openstack-discuss
mailing list