[openstack-dev] [nova] bug 1334398 and libvirt live snapshot support

Jeremy Stanley fungi at yuggoth.org
Mon Dec 8 21:12:24 UTC 2014

On 2014-12-08 11:45:36 +0100 (+0100), Kashyap Chamarthy wrote:
> As Dan Berrangé noted, it's nearly impossible to reproduce this issue
> independently outside of OpenStack Gating environment. I brought this up
> at the recently concluded KVM Forum earlier this October. To debug this
> any further, one of the QEMU block layer developers asked if we can get
> QEMU instance running on Gate run under `gdb` (IIRC, danpb suggested
> this too, previously) to get further tracing details.

We document thoroughly how to reproduce the environments we use for
testing OpenStack. There's nothing rarified about "a Gate run" that
anyone with access to a public cloud provider would be unable to
reproduce, save being able to run it over and over enough times to
expose less frequent failures.

> FWIW, I myself couldn't reproduce it independently via libvirt
> alone or via QMP (QEMU Machine Protocol) commands.
> Dan's workaround ("enable it permanently, except for under the
> gate") sounds sensible to me.

I'm dubious of this as it basically says "we know this breaks
sometimes, so we're going to stop testing that it works at all and
possibly let it get even more broken, but you should be safe to rely
on it anyway."

The QA team tries very hard to make our integration testing
environment as closely as possible mimic real-world deployment
configurations. If these sorts of bugs emerge more often because of,
for example, resource constraints in the test environment then it
should be entirely likely they'd also be seen in production with the
same frequency if run on similarly constrained equipment. And as
we've observed in the past, any code path we stop testing quickly
accumulates new bugs that go unnoticed until they impact someone's
production environment at 3am.
