[nova][train] live migration issue

Ignazio Cassano ignaziocassano at gmail.com
Wed May 19 15:17:52 UTC 2021


Hello, some news ....I wonder if they can help:
I am testing with some virtual machine again.
If I follows this steps it works (but I lost network connection):

1) Detach network interface from instance
2) Attach network interface to instance
3) Migrate instance
4) Loggin into instance using console and restart networking

while if I restart networking before live migration it does not work.
So, when someone mentioned

########################
we get this "guest index inconsistent" error when the migrated RAM is
inconsistent with the migrated 'virtio' device state. And a common case is
where a 'virtio' device does an operation after the vCPU is stopped and
after RAM has been transmitted.
#############################à
the network traffic could be the problem ?
Ignazio

Il giorno mer 19 mag 2021 alle ore 16:35 Kashyap Chamarthy <
kchamart at redhat.com> ha scritto:

> (Hi, we've talked on #openstack-nova; updating on list too.)
>
> On Wed, May 19, 2021 at 10:48:11AM +0200, Ignazio Cassano wrote:
> > Hello Guys,
> > on train centos7 I am facing live migration issue only for some instances
> > (not all).
> > The error reported is:
> > 2021-05-19 08:45:57.096 142537 ERROR nova.compute.manager [-] [instance:
> > b18450e8-b3db-4886-a737-c161d99c6a46] Live migration failed.:
> libvirtError:
> > Unable to read from monitor: Connection reset by peer
> >
> > The instance remains in pause on both source and destination host.
> >
> > Any help,please ?
>
> Summarizing the issue for those who are following along this conversation:
>
> The debugging chat tral from #openstack-nova starts here:
>
> http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2021-05-19.log.html#t2021-05-19T08:50:11
>
> Version
> -------
>
> - libvirt: 4.5.0, package: 36.el7_9.5
> - QEMU: 2.12.0qemu-kvm-ev-2.12.0-44.1.el7_8.1
> - kernel: 3.10.0-1160.25.1.el7.x86_64
>
> Problem
> -------
>
> It seems to be some guests (on NFS) seem to crash during live migration
> with the below errors in the QEMU guest log:
>
>     [...]
>     2021-05-19T08:12:30.396878Z qemu-kvm: Failed to load
> virtqueue_state:vring.used
>     2021-05-19T08:12:30.397555Z qemu-kvm: Failed to load
> virtio/virtqueues:vq
>     2021-05-19T08:12:30.397581Z qemu-kvm: Failed to load virtio-blk:virtio
>     2021-05-19T08:12:30.397606Z qemu-kvm: error while loading state for
> instance 0x0 of device '0000:00:08.0/virtio-blk'
>     2021-05-19T08:12:30.399542Z qemu-kvm: load of migration failed:
> Input/output error
>     2021-05-19 08:12:31.022+0000: shutting down, reason=crashed
>     [...]
>
> And this error from libvirt (as obtained via `journalctl -u libvirtd -l
> --since=yesterday -p err`):
>
>     error : qemuDomainObjBeginJobInternal:6825 : Timed out during
>     operation: cannot acquire state change lock (held by monitor=remo
>
> Diagnosis
> ---------
>
> Further, these "cannot acquire state change lock" error from libvirt is
> notoriously hard to debug without a reliable reproducer.  As it could be
> due to QEMU getting hung, which in turn could be caused by stuck I/O.
>
> See also the discussion (but no conclusion) on this related QEMU bug[1].
> Particularly comment#11.
>
> In short, without a solid reproducer, these virtio issues are hard to
> track down, I'm afraid.
>
>
> [1] https://bugs.launchpad.net/nova/+bug/1761798 -- live migration
>     intermittently fails in CI with "VQ 0 size 0x80 Guest index 0x12c
>     inconsistent with Host index 0x134: delta 0xfff8"
>
> --
> /kashyap
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210519/b7033142/attachment.html>


More information about the openstack-discuss mailing list