[openstack-dev] [nova] top gate bug is libvirt snapshot

Alex Xu xuhj at linux.vnet.ibm.com
Wed Jul 16 03:22:20 UTC 2014


Question about swap volume, swap volume's implementation is very similar 
with live snapshot.
Both implemented by blockRebase. But swap volume didn't check any 
libvirt and qemu version.
Should we add version check for swap_volume now? That means swap_volume 
will be disable also.

On 2014?06?26? 19:00, Sean Dague wrote:
> While the Trusty transition was mostly uneventful, it has exposed a
> particular issue in libvirt, which is generating ~ 25% failure rate now
> on most tempest jobs.
>
> As can be seen here -
> https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297
>
>
> ... the libvirt live_snapshot code is something that our test pipeline
> has never tested before, because it wasn't a new enough libvirt for us
> to take that path.
>
> Right now it's exploding, a lot -
> https://bugs.launchpad.net/nova/+bug/1334398
>
> Snapshotting gets used in Tempest to create images for testing, so image
> setup tests are doing a decent number of snapshots. If I had to take a
> completely *wild guess*, it's that libvirt can't do 2 live_snapshots at
> the same time. It's probably something that most people haven't hit. The
> wild guess is based on other libvirt issues we've hit that other people
> haven't, and they are basically always a parallel ops triggered problem.
>
> My 'stop the bleeding' suggested fix is this -
> https://review.openstack.org/#/c/102643/ which just effectively disables
> this code path for now. Then we can get some libvirt experts engaged to
> help figure out the right long term fix.
>
> I think there are a couple:
>
> 1) see if newer libvirt fixes this (1.2.5 just came out), and if so
> mandate at some known working version. This would actually take a bunch
> of work to be able to test a non packaged libvirt in our pipeline. We'd
> need volunteers for that.
>
> 2) lock snapshot operations in nova-compute, so that we can only do 1 at
> a time. Hopefully it's just 2 snapshot operations that is the issue, not
> any other libvirt op during a snapshot, so serializing snapshot ops in
> n-compute could put the kid gloves on libvirt and make it not break
> here. This also needs some volunteers as we're going to be playing a
> game of progressive serialization until we get to a point where it looks
> like the failures go away.
>
> 3) Roll back to precise. I put this idea here for completeness, but I
> think it's a terrible choice. This is one isolated, previously untested
> (by us), code path. We can't stay on libvirt 0.9.6 forever, so actually
> need to fix this for real (be it in nova's use of libvirt, or libvirt
> itself).
>
> There might be other options as well, ideas welcomed.
>
> But for right now, we should stop the bleeding, so that nova/libvirt
> isn't blocking everyone else from merging code.
>
> 	-Sean
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140716/abc8a357/attachment.html>


More information about the OpenStack-dev mailing list