[openstack-dev] [nova][libvirt] Block migrations and Cinder volumes

Daniel P. Berrange berrange at redhat.com
Wed Mar 18 10:08:32 UTC 2015


On Tue, Mar 17, 2015 at 01:33:26PM -0700, Joe Gordon wrote:
> On Thu, Jun 19, 2014 at 1:38 AM, Daniel P. Berrange <berrange at redhat.com>
> wrote:
> 
> > On Wed, Jun 18, 2014 at 11:09:33PM -0700, Rafi Khardalian wrote:
> > > I am concerned about how block migration functions when Cinder volumes
> > are
> > > attached to an instance being migrated.  We noticed some unexpected
> > > behavior recently, whereby attached generic NFS-based volumes would
> > become
> > > entirely unsparse over the course of a migration.  After spending some
> > time
> > > reviewing the code paths in Nova, I'm more concerned that this was
> > actually
> > > a minor symptom of a much more significant issue.
> > >
> > > For those unfamiliar, NFS-based volumes are simply RAW files residing on
> > an
> > > NFS mount.  From Libvirt's perspective, these volumes look no different
> > > than root or ephemeral disks.  We are currently not filtering out volumes
> > > whatsoever when making the request into Libvirt to perform the migration.
> > >  Libvirt simply receives an additional flag (VIR_MIGRATE_NON_SHARED_INC)
> > > when a block migration is requested, which applied to the entire
> > migration
> > > process, not differentiated on a per-disk basis.  Numerous guards within
> > > Nova to prevent a block based migration from being allowed if the
> > instance
> > > disks exist on the destination; yet volumes remain attached and within
> > the
> > > defined XML during a block migration.
> > >
> > > Unless Libvirt has a lot more logic around this than I am lead to
> > believe,
> > > this seems like a recipe for corruption.  It seems as though this would
> > > also impact any type of volume attached to an instance (iSCSI, RBD,
> > etc.),
> > > NFS just happens to be what we were testing.  If I am wrong and someone
> > can
> > > correct my understanding, I would really appreciate it.  Otherwise, I'm
> > > surprised we haven't had more reports of issues when block migrations are
> > > used in conjunction with any attached volumes.
> >
> > Libvirt/QEMU has no special logic. When told to block-migrate, it will do
> > so for *all* disks attached to the VM in read-write-exclusive mode. It will
> > only skip those marked read-only or read-write-shared mode. Even that
> > distinction is somewhat dubious and so not reliably what you would want.
> >
> > It seems like we should just disallow block migrate when any cinder volumes
> > are attached to the VM, since there is never any valid use case for doing
> > block migrate from a cinder volume to itself.
> 
> Digging up this old thread because I am working on getting multi node live
> migration testing working (https://review.openstack.org/#/c/165182/), and
> just ran into this issue (bug 1398999).
> 
> And I am not sure I agree with this statement. I think there is a valid
> case for doing block migrate with a cinder volume attached to an instance:

To be clear, I'm not saying the use cases for block migrating cinder are
invalid. Just that with the way libvirt exposes block migration today
it isn't safe for us to allow it, because we don't have fine grained
control to make it reliably safe from openstack. We need to improve the
libvirt API in this area and then we can support this feature properly.

> * Cloud isn't using a shared filesystem for ephemeral storage
> * Instance is booted from an image, and a volume is attached afterwards. An
> admin wants to take the box the instance is running on offline for
> maintanince with a minimal impact to the instances running on it.
> 
> What is the recommended solution for that use case? If the admin
> disconnects and reconnects the volume themselves is there a risk of
> impacting whats running on the instance? etc.

Yes, and that sucks, but that's the only safe option today, otherwise
libvirt is going to try copying the data in the cinder volumes itself,
which means it is copying from the volume on one host, back into the
very same volume on the other host. IOW it is rewriting all the data
even though the volume is shared betwteen the hosts. This has dangerous
data corruption failure scenarios as well as being massively wasteful
of CPU and network bandwidth.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list