[openstack-dev] [nova][cinder] Disabling nova volume-update (aka swap volume; aka cinder live migration)
Matthew Booth
mbooth at redhat.com
Wed Aug 22 10:27:43 UTC 2018
On Wed, 22 Aug 2018 at 10:47, Gorka Eguileor <geguileo at redhat.com> wrote:
>
> On 20/08, Matthew Booth wrote:
> > For those who aren't familiar with it, nova's volume-update (also
> > called swap volume by nova devs) is the nova part of the
> > implementation of cinder's live migration (also called retype).
> > Volume-update is essentially an internal cinder<->nova api, but as
> > that's not a thing it's also unfortunately exposed to users. Some
> > users have found it and are using it, but because it's essentially an
> > internal cinder<->nova api it breaks pretty easily if you don't treat
> > it like a special snowflake. It looks like we've finally found a way
> > it's broken for non-cinder callers that we can't fix, even with a
> > dirty hack.
> >
> > volume-update <server> <old> <new> essentially does a live copy of the
> > data on <old> volume to <new> volume, then seamlessly swaps the
> > attachment to <server> from <old> to <new>. The guest OS on <server>
> > will not notice anything at all as the hypervisor swaps the storage
> > backing an attached volume underneath it.
> >
> > When called by cinder, as intended, cinder does some post-operation
> > cleanup such that <old> is deleted and <new> inherits the same
> > volume_id; that is <old> effectively becomes <new>. When called any
> > other way, however, this cleanup doesn't happen, which breaks a bunch
> > of assumptions. One of these is that a disk's serial number is the
> > same as the attached volume_id. Disk serial number, in KVM at least,
> > is immutable, so can't be updated during volume-update. This is fine
> > if we were called via cinder, because the cinder cleanup means the
> > volume_id stays the same. If called any other way, however, they no
> > longer match, at least until a hard reboot when it will be reset to
> > the new volume_id. It turns out this breaks live migration, but
> > probably other things too. We can't think of a workaround.
> >
> > I wondered why users would want to do this anyway. It turns out that
> > sometimes cinder won't let you migrate a volume, but nova
> > volume-update doesn't do those checks (as they're specific to cinder
> > internals, none of nova's business, and duplicating them would be
> > fragile, so we're not adding them!). Specifically we know that cinder
> > won't let you migrate a volume with snapshots. There may be other
> > reasons. If cinder won't let you migrate your volume, you can still
> > move your data by using nova's volume-update, even though you'll end
> > up with a new volume on the destination, and a slightly broken
> > instance. Apparently the former is a trade-off worth making, but the
> > latter has been reported as a bug.
> >
>
> Hi Matt,
>
> As you know, I'm in favor of making this REST API call only authorized
> for Cinder to avoid messing the cloud.
>
> I know you wanted Cinder to have a solution to do live migrations of
> volumes with snapshots, and while this is not possible to do in a
> reasonable fashion, I kept thinking about it given your strong feelings
> to provide a solution for users that really need this, and I think we
> may have a "reasonable" compromise.
>
> The solution is conceptually simple. We add a new API microversion in
> Cinder that adds and optional parameter called "generic_keep_source"
> (defaults to False) to both migrate and retype operations.
>
> This means that if the driver optimized migration cannot do the
> migration and the generic migration code is the one doing the migration,
> then, instead of our final step being to swap the volume id's and
> deleting the source volume, what we would do is to swap the volume id's
> and move all the snapshots to reference the new volume. Then we would
> create a user message with the new ID of the volume.
>
> This way we can preserve the old volume with all its snapshots and do
> the live migration.
>
> The implementation is a little bit tricky, as we'll have to add anew
> "update_migrated_volume" mechanism to support the renaming of both
> volumes, since the old one wouldn't work with this among other things,
> but it's doable.
>
> Unfortunately I don't have the time right now to work on this...
Sounds promising, and honestly more than I'd have hoped for.
Matt
--
Matthew Booth
Red Hat OpenStack Engineer, Compute DFG
Phone: +442070094448 (UK)
More information about the OpenStack-dev
mailing list