<div dir="ltr"><div>Hello, some news ....I wonder if they can help:</div><div>I am testing with some virtual machine again.</div><div>If I follows this steps it works (but I lost network connection):</div><div><br></div><div>1) Detach network interface from instance</div><div>2) Attach network interface to instance</div><div>3) Migrate instance</div><div>4) Loggin into instance using console and restart networking <br></div><div><br></div><div>while if I restart networking before live migration it does not work.</div><div>So, when someone mentioned</div><div><br></div><div>########################</div><div>we get this "guest index inconsistent" error when the migrated RAM is inconsistent with the migrated 'virtio' device state. And a common case is where a 'virtio' device does an operation after the vCPU is stopped and after RAM has been transmitted.</div><div>#############################à</div><div>the network traffic could be the problem ?</div><div>Ignazio<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno mer 19 mag 2021 alle ore 16:35 Kashyap Chamarthy <<a href="mailto:kchamart@redhat.com">kchamart@redhat.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">(Hi, we've talked on #openstack-nova; updating on list too.)<br>
<br>
On Wed, May 19, 2021 at 10:48:11AM +0200, Ignazio Cassano wrote:<br>
> Hello Guys,<br>
> on train centos7 I am facing live migration issue only for some instances<br>
> (not all).<br>
> The error reported is:<br>
> 2021-05-19 08:45:57.096 142537 ERROR nova.compute.manager [-] [instance:<br>
> b18450e8-b3db-4886-a737-c161d99c6a46] Live migration failed.: libvirtError:<br>
> Unable to read from monitor: Connection reset by peer<br>
> <br>
> The instance remains in pause on both source and destination host.<br>
> <br>
> Any help,please ?<br>
<br>
Summarizing the issue for those who are following along this conversation:<br>
<br>
The debugging chat tral from #openstack-nova starts here:<br>
<a href="http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2021-05-19.log.html#t2021-05-19T08:50:11" rel="noreferrer" target="_blank">http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2021-05-19.log.html#t2021-05-19T08:50:11</a><br>
<br>
Version<br>
-------<br>
<br>
- libvirt: 4.5.0, package: 36.el7_9.5<br>
- QEMU: 2.12.0qemu-kvm-ev-2.12.0-44.1.el7_8.1<br>
- kernel: 3.10.0-1160.25.1.el7.x86_64<br>
<br>
Problem<br>
-------<br>
<br>
It seems to be some guests (on NFS) seem to crash during live migration<br>
with the below errors in the QEMU guest log:<br>
<br>
[...]<br>
2021-05-19T08:12:30.396878Z qemu-kvm: Failed to load virtqueue_state:vring.used<br>
2021-05-19T08:12:30.397555Z qemu-kvm: Failed to load virtio/virtqueues:vq<br>
2021-05-19T08:12:30.397581Z qemu-kvm: Failed to load virtio-blk:virtio<br>
2021-05-19T08:12:30.397606Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:08.0/virtio-blk'<br>
2021-05-19T08:12:30.399542Z qemu-kvm: load of migration failed: Input/output error<br>
2021-05-19 08:12:31.022+0000: shutting down, reason=crashed<br>
[...]<br>
<br>
And this error from libvirt (as obtained via `journalctl -u libvirtd -l<br>
--since=yesterday -p err`):<br>
<br>
error : qemuDomainObjBeginJobInternal:6825 : Timed out during<br>
operation: cannot acquire state change lock (held by monitor=remo<br>
<br>
Diagnosis<br>
---------<br>
<br>
Further, these "cannot acquire state change lock" error from libvirt is<br>
notoriously hard to debug without a reliable reproducer. As it could be<br>
due to QEMU getting hung, which in turn could be caused by stuck I/O.<br>
<br>
See also the discussion (but no conclusion) on this related QEMU bug[1].<br>
Particularly comment#11.<br>
<br>
In short, without a solid reproducer, these virtio issues are hard to<br>
track down, I'm afraid.<br>
<br>
<br>
[1] <a href="https://bugs.launchpad.net/nova/+bug/1761798" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1761798</a> -- live migration<br>
intermittently fails in CI with "VQ 0 size 0x80 Guest index 0x12c<br>
inconsistent with Host index 0x134: delta 0xfff8"<br>
<br>
-- <br>
/kashyap<br>
<br>
</blockquote></div>