[openstack-dev] [nova][cinder] Update (swap) of multiattach volume should not be allowed
Artom Lifshitz
alifshit at redhat.com
Wed Jun 6 13:10:57 UTC 2018
I think regardless of how we ended up with this situation, we're still
in a position where we have a public-facing API that could lead to
data-corruption when used in a specific way. That should never be the
case. I would think re-using the already possible 400 response code to
update-volume when used with a multi-attach volume to indicate that it
can't be done, without a new microversion, would be the cleaned way of
getting out of this pickle.
On Wed, Jun 6, 2018 at 2:55 PM, Jay Pipes <jaypipes at gmail.com> wrote:
> On 06/06/2018 07:46 AM, Matthew Booth wrote:
>>
>> TL;DR I think we need to entirely disable swap volume for multiattach
>> volumes, and this will be an api breaking change with no immediate
>> workaround.
>>
>> I was looking through tempest and came across
>>
>> api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach.
>> This test does:
>>
>> Create 2 multiattach volumes
>> Create 2 servers
>> Attach volume 1 to both servers
>> ** Swap volume 1 for volume 2 on server 1 **
>> Check all is attached as expected
>>
>> The problem with this is that swap volume is a copy operation.
>
>
> Is it, though? The original blueprint and implementation seem to suggest
> that the swap_volume operation was nothing more than changing the mountpoint
> for a volume to point to a different location (in a safe
> manner that didn't lose any reads or writes).
>
> https://blueprints.launchpad.net/nova/+spec/volume-swap
>
> Nothing about the description of swap_volume() in the virt driver interface
> mentions swap_volume() being a "copy operation":
>
> https://github.com/openstack/nova/blob/76ec078d3781fb55c96d7aaca4fb73a74ce94d96/nova/virt/driver.py#L476
>
>> We don't just replace one volume with another, we copy the contents
>> from one to the other and then do the swap. We do this with a qemu
>> drive mirror operation, which is able to do this copy safely without
>> needing to make the source read-only because it can also track writes
>> to the source and ensure the target is updated again. Here's a link
>> to the libvirt logs showing a drive mirror operation during the swap
>> volume of an execution of the above test:
>
> After checking the source code, the libvirt virt driver is the only virt
> driver that implements swap_volume(), so it looks to me like a public HTTP
> API method was added that was specific to libvirt's implementation of drive
> mirroring. Yay, more implementation leaking out through the API.
>
>>
>> http://logs.openstack.org/58/567258/5/check/nova-multiattach/d23fad8/logs/libvirt/libvirtd.txt.gz#_2018-06-04_10_57_05_201
>>
>> The problem is that when the volume is attached to more than one VM,
>> the hypervisor doing the drive mirror *doesn't* know about writes on
>> the other attached VMs, so it can't do that copy safely, and the
>> result is data corruption.
>
>
> Would it be possible to swap the volume by doing what Vish originally
> described in the blueprint: pause the VM, swap the volume mountpoints
> (potentially after migrating the underlying volume), start the VM?
>
>>
> Note that swap volume isn't visible to the
>>
>> guest os, so this can't be addressed by the user. This is a data
>> corrupter, and we shouldn't allow it. However, it is in released code
>> and users might be doing it already, so disabling it would be a
>> user-visible api change with no immediate workaround.
>
>
> I'd love to know who is actually using the swap_volume() functionality,
> actually. I'd especially like to know who is using swap_volume() with
> multiattach.
>
>> However, I think we're attempting to do the wrong thing here anyway,
>> and the above tempest test is explicit testing behaviour that we don't
>> want. The use case for swap volume is that a user needs to move volume
>> data for attached volumes, e.g. to new faster/supported/maintained
>> hardware.
>
>
> Is that the use case?
>
> As was typical, there's no mention of a use case on the original blueprint.
> It just says "This feature allows a user or administrator to transparently
> swap out a cinder volume that connected to an instance." Which is hardly a
> use case since it uses the feature name in a description of the feature
> itself. :(
>
> The commit message (there was only a single commit for this functionality
> [1]) mentions overwriting data on the new volume:
>
> Adds support for transparently swapping an attached volume with
> another volume. Note that this overwrites all data on the new volume
> with data from the old volume.
>
> Yes, that is the commit message in its entirety. Of course, the commit had
> no documentation at all in it, so there's no ability to understand what the
> original use case really was here.
>
> https://review.openstack.org/#/c/28995/
>
> If the use case was really "that a user needs to move volume data for
> attached volumes", why not just pause the VM, detach the volume, do a
> openstack volume migrate to the new destination, reattach the volume and
> start the VM? That would mean no libvirt/QEMU-specific implementation
> behaviour leaking out of the public HTTP API and allow the volume service
> (Cinder) to do its job properly.
>
>
>> With single attach that's exactly what they get: the end
>> user should never notice. With multi-attach they don't get that. We're
>> basically forking the shared volume at a point in time, with the
>> instance which did the swap writing to the new location while all
>> others continue writing to the old location. Except that even the fork
>> is broken, because they'll get a corrupt, inconsistent copy rather
>> than point in time. I can't think of a use case for this behaviour,
>> and it certainly doesn't meet the original design intent.
>>
>> What they really want is for the multi-attached volume to be copied
>> from location a to location b and for all attachments to be updated.
>> Unfortunately I don't think we're going to be in a position to do that
>> any time soon, but I also think users will be unhappy if they're no
>> longer able to move data at all because it's multi-attach. We can
>> compromise, though, if we allow a multiattach volume to be moved as
>> long as it only has a single attachment. This means the operator can't
>> move this data without disruption to users, but at least it's not
>> fundamentally immovable.
>>
>> This would require some cooperation with cinder to achieve, as we need
>> to be able to temporarily prevent cinder from allowing new
>> attachments. A natural way to achieve this would be to allow a
>> multi-attach volume with only a single attachment to be redesignated
>> not multiattach, but there might be others. The flow would then be:
>>
>> Detach volume from server 2
>> Set multiattach=False on volume
>> Migrate volume on server 1
>> Set multiattach=True on volume
>> Attach volume to server 2
>>
>> Combined with a patch to nova to disallow swap_volume on any
>> multiattach volume, this would then be possible if inconvenient.
>>
>> Regardless of any other changes, though, I think it's urgent that we
>> disable the ability to swap_volume a multiattach volume because we
>> don't want users to start using this relatively new, but broken,
>> feature.
>
>
> Or we could deprecate the swap_volume Compute API operation and use Cinder
> for all of this.
>
> But sure, we could also add more cruft to the Compute API and add more
> conditional "it works but only when X" docs to the API reference.
>
> Just my two cents,
> -jay
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
--
Artom Lifshitz
Software Engineer, OpenStack Compute DFG
More information about the OpenStack-dev
mailing list