How to debug silent live migration errors
jmlineb at sandia.gov
Tue Mar 30 13:26:35 UTC 2021
How would I debug silent (or mostly silent) live migration errors? We're using the Stein release of Canonical's Charmed OpenStack. I have configured it for live migration per the instructions at this link:
1. I did not specify vncserver_listen=0.0.0.0 in nova.conf because we are not running VNC on our instances
2. instances_path is /var/lib/nova/instances on all compute nodes
3. I believe that MAAS is "the sole provider of DHCP and DNS for the network hosting the MAAS cluster", per https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/install-maas.html
4. Identical authorized_keys files are present on all compute nodes with keys from all compute nodes by default
5. I manually configured the firewalls on all compute nodes to allow libvirt to communicate between compute hosts with:
sudo ufw allow 49152:49261/tcp
6. The following settings are specified in nova.conf on each compute node:
live_migration_downtime = 500
live_migration_downtime_steps = 10
live_migration_downtime_delay = 75
Here's what happens when I try to Live Migrate from the Horizon Dashboard:
1. As admin, in the Admin --> Instances menu, I select the dropdown arrow to the right of the instance. Live Migrate Instance appears (but in black, unlike Migrate Instance, which appears in red). I select Live Migrate Instance, and whether or not I Automatically schedule new host or manually select a new host the Task column says "Migrating" and then it stops and reverts to None. The server never changes. The Action Log shows the live migration request but the Message column is blank.
2. I do the very same thing but this time select Disk Over Commit. Same results. Migrating reverts back to None and the server never changes.
3. I do the very same thing but this time select Block Migration. This time I do get an error: "Failed to live migrate instance to host 'AUTO_SCHEDULE'". And this time the Action Log has "Error" in the Message column.
Same behavior with the CLI. For example, this CLI command below completes silently, yet the server for the instance never changes.
openstack server migrate <instanceID> --live <newServerName>
openstack server show <instanceID>
[Still running on original server]
Note that I *can* successfully Migrate, both using the Horizon Dashboard and the CLI. What fails is Live Migration. I just have no idea why, and no error is displayed in the Action Log for the instance.
For reference, the instance is an m1.small with 2GB of RAM, 1 VCPU, and a 20GB Cinder disk volume attached on /dev/vda.
Any and all debugging ideas would be most welcome. Without logs I am simply guessing in the dark at this point.
John M. Linebarger, PhD, MBA
Principal Member of Technical Staff
Sandia National Laboratories
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openstack-discuss