Failed live migration and duplicate attachments
After doing live migrations for some instances, and those migrations failing, the attached volumes show duplicates of the same attachment. This is the error message I get when the migration fails: https://pastebin.com/raw/3mxSVnRR openstack volume list shows the volume is attached to the instance twice. | a424fd41-a72f-4099-9c1a-47114d43c1dc | zktech-wpdb1 | in-use | 50 | Attached to ead8ecc3-f473-4672-a67b-c44534c6042d on /dev/vda Attached to ead8ecc3-f473-4672-a67b-c44534c6042d on /dev/vda How do I remove the duplicate attachment, and why could the live migration be failing in the first place? Not all migrations fail, but sometimes they do and I have multiple volumes with duplicate attachments. Torin Woltjer Grand Dial Communications - A ZK Tech Inc. Company 616.776.1066 ext. 2006 www.granddial.com
On 12/19/2018 9:09 AM, Torin Woltjer wrote:
After doing live migrations for some instances, and those migrations failing, the attached volumes show duplicates of the same attachment. This is the error message I get when the migration fails: https://pastebin.com/raw/3mxSVnRR
openstack volume list shows the volume is attached to the instance twice. | a424fd41-a72f-4099-9c1a-47114d43c1dc | zktech-wpdb1 | in-use | 50 | Attached to ead8ecc3-f473-4672-a67b-c44534c6042d on /dev/vda Attached to ead8ecc3-f473-4672-a67b-c44534c6042d on /dev/vda
How do I remove the duplicate attachment, and why could the live migration be failing in the first place? Not all migrations fail, but sometimes they do and I have multiple volumes with duplicate attachments.
/*Torin Woltjer*/ *Grand Dial Communications - A ZK Tech Inc. Company* *616.776.1066 ext. 2006* /*<http://www.granddial.com>www.granddial.com <http://www.granddial.com>*/
From the log, it looks like the live migration is timing out and aborting itself: 2018-12-17 13:47:12.449 16987 INFO nova.virt.libvirt.driver [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Migration running for 2280 secs, memory 3% remaining; (bytes processed=2541888418, remaining=126791680, total=3226542080) 2018-12-17 13:47:29.591 16987 WARNING nova.virt.libvirt.migration [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Live migration not completed after 2400 sec 2018-12-17 13:47:30.097 16987 WARNING nova.virt.libvirt.driver [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Migration operation was cancelled 2018-12-17 13:47:30.299 16987 ERROR nova.virt.libvirt.driver [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Live Migration failure: operation aborted: migration job: canceled by client: libvirtError: operation aborted: migration job: canceled by client After that, the _rollback_live_migration method is called which is trying to cleanup volume attachments created against the destination host (during pre_live_migration). The attachment cleanup is failing because it looks like the user token has expired: 2018-12-17 13:47:30.685 16987 INFO nova.compute.manager [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Swapping old allocation on 3e32d595-bd1f-4136-a7f4-c6703d2fbe18 held by migration 17bec61d-544d-47e0-a1c1-37f9d7385286 for instance 2018-12-17 13:47:32.450 16987 ERROR nova.volume.cinder [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] Delete attachment failed for attachment 58997d5b-24f0-4073-819e-97916fb1ee19. Error: The request you have made requires authentication. (HTTP 401) Code: 401: Unauthorized: The request you have made requires authentication. (HTTP 401) 2018-12-17 13:47:32.497 16987 WARNING nova.virt.libvirt.driver [req-7bc758de-b2e4-461b-a971-f79be6cd4703 313d1247d7b845da9c731eec53e50a26 2f693c782fa748c2baece8db95b4ba5b - default default] [instance: ead8ecc3-f473-4672-a67b-c44534c6042d] Error monitoring migration: The request you have made requires authentication. (HTTP 401): Unauthorized: The request you have made requires authentication. (HTTP 401) Given that, you'd probably be interested in configuring nova to use service user tokens: https://docs.openstack.org/nova/latest/configuration/config.html#service-use... With that feature, you configure nova with service user credentials so in the case that the user token times out, keystone automatically re-authenticates using the service user credentials. More details can be found in the spec: https://specs.openstack.org/openstack/nova-specs/specs/ocata/implemented/use... -- Thanks, Matt
On 12/19/2018 1:46 PM, Matt Riedemann wrote:
Given that, you'd probably be interested in configuring nova to use service user tokens:
https://docs.openstack.org/nova/latest/configuration/config.html#service-use...
I've posted https://review.openstack.org/#/c/626388/ to document this in the nova docs - that should have been done way back in Ocata when the feature was added. -- Thanks, Matt
participants (2)
-
Matt Riedemann
-
Torin Woltjer