[Openstack-operators] resized (migrated) disks corrupt after live (kvm) block-migration

Paul Dekkers paul.dekkers at surfnet.nl
Thu Mar 13 20:57:44 UTC 2014


Hi,

I'm running Havana from Ubuntu Cloud Archive (2013.2.1) on Ubuntu
12.04.4 LTS and have an issue with instances that are block-migrated
(KVM) after resize (or migration). My /var/lib/nova/instances is on ZFS.

Normal instances, with backing file, are live+block-migrated just fine.

The instances get vda and filesystem errors (I've tried an Ubuntu image
mostly, so ext4):

[  615.659896] end_request: I/O error, dev vda, sector 4458520
[ 2779.352414] end_request: I/O error, dev vda, sector 4458520

This is what the disk file looks like before a migrate/resize:

# file disk
disk: QEMU QCOW Image (v2), has backing file (path
/var/lib/nova/instances/_base/b17b61076359d4a745dcaddee798ac0b7),
10737418240 bytes

# qemu-img info disk
image: disk
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 3.7M
cluster_size: 65536
backing file:
/var/lib/nova/instances/_base/b17b61076359d4a745dcaddee798ac0b7cd0b8ef

This is after the flavor resize:

# file disk
disk: QEMU QCOW Image (v2), 21474836480 bytes

# qemu-img info disk
image: disk
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 385M
cluster_size: 65536

So no backing file anymore (as with a normal migration). If I then
live-migrate, with:
nova live-migration --block-migrate <uuid>
the virtual size is wrong:

# file disk
disk: QEMU QCOW Image (v2), 1190723584 bytes

# qemu-img info disk
image: disk
file format: qcow2
virtual size: 1.1G (1190723584 bytes)
disk size: 385M
cluster_size: 65536

... and the errors inside the instance begin, and give filesystem and
data corruption.

Interestingly, I can "fix" the instance by doing a resize or migrate
again, the virtual size is then corrected. But then I still can't
live-migrate of course.

It doesn't matter if the instance is powered on or shut down before a
resize/migrate. The instance size doesn't matter, it happens with the
smallest instances. There's nothing different in the libvirt xml files
for the affected instances.

Any idea? I'm curious if other people stumbled upon this issue,

Paul



More information about the OpenStack-operators mailing list