[openstack-dev] [nova] vhost-scsi support in Nova
Vishvananda Ishaya
vishvananda at gmail.com
Thu Jul 24 18:45:46 UTC 2014
On Jul 24, 2014, at 3:06 AM, Daniel P. Berrange <berrange at redhat.com> wrote:
> On Wed, Jul 23, 2014 at 10:32:44PM -0700, Nicholas A. Bellinger wrote:
>> *) vhost-scsi doesn't support migration
>>
>> Since it's initial merge in QEMU v1.5, vhost-scsi has a migration blocker
>> set. This is primarily due to requiring some external orchestration in
>> order to setup the necessary vhost-scsi endpoints on the migration
>> destination to match what's running on the migration source.
>>
>> Here are a couple of points that Stefan detailed some time ago about what's
>> involved for properly supporting live migration with vhost-scsi:
>>
>> (1) vhost-scsi needs to tell QEMU when it dirties memory pages, either by
>> DMAing to guest memory buffers or by modifying the virtio vring (which also
>> lives in guest memory). This should be straightforward since the
>> infrastructure is already present in vhost (it's called the "log") and used
>> by drivers/vhost/net.c.
>>
>> (2) The harder part is seamless target handover to the destination host.
>> vhost-scsi needs to serialize any SCSI target state from the source machine
>> and load it on the destination machine. We could be in the middle of
>> emulating a SCSI command.
>>
>> An obvious solution is to only support active-passive or active-active HA
>> setups where tcm already knows how to fail over. This typically requires
>> shared storage and maybe some communication for the clustering mechanism.
>> There are more sophisticated approaches, so this straightforward one is just
>> an example.
>>
>> That said, we do intended to support live migration for vhost-scsi using
>> iSCSI/iSER/FC shared storage.
>>
>> *) vhost-scsi doesn't support qcow2
>>
>> Given all other cinder drivers do not use QEMU qcow2 to access storage
>> blocks, with the exception of the Netapp and Gluster driver, this argument
>> is not particularly relevant here.
>>
>> However, this doesn't mean that vhost-scsi (and target-core itself) cannot
>> support qcow2 images. There is currently an effort to add a userspace
>> backend driver for the upstream target (tcm_core_user [3]), that will allow
>> for supporting various disk formats in userspace.
>>
>> The important part for vhost-scsi is that regardless of what type of target
>> backend driver is put behind the fabric LUNs (raw block devices using
>> IBLOCK, qcow2 images using target_core_user, etc) the changes required in
>> Nova and libvirt to support vhost-scsi remain the same. They do not change
>> based on the backend driver.
>>
>> *) vhost-scsi is not intended for production
>>
>> vhost-scsi has been included the upstream kernel since the v3.6 release, and
>> included in QEMU since v1.5. vhost-scsi runs unmodified out of the box on a
>> number of popular distributions including Fedora, Ubuntu, and OpenSuse. It
>> also works as a QEMU boot device with Seabios, and even with the Windows
>> virtio-scsi mini-port driver.
>>
>> There is at least one vendor who has already posted libvirt patches to
>> support vhost-scsi, so vhost-scsi is already being pushed beyond a debugging
>> and development tool.
>>
>> For instance, here are a few specific use cases where vhost-scsi is
>> currently the only option for virtio-scsi guests:
>>
>> - Low (sub 100 usec) latencies for AIO reads/writes with small iodepth
>> workloads
>> - 1M+ small block IOPs workloads at low CPU utilization with large
>> iopdeth workloads.
>> - End-to-end data integrity using T10 protection information (DIF)
>
> IIUC, there is also missing support for block jobs like drive-mirror
> which is needed by Nova.
>
> From a functionality POV migration & drive-mirror support are the two
> core roadblocks to including vhost-scsi in Nova (as well as libvirt
> support for it of course). Realistically it doesn't sound like these
> are likely to be solved soon enough to give us confidence in taking
> this for the Juno release cycle.
As I understand this work, vhost-scsi provides massive perf improvements
over virtio, which makes it seem like a very valuable addition. I’m ok
with telling customers that it means that migration and snapshotting are
not supported as long as the feature is protected by a flavor type or
image metadata (i.e. not on by default). I know plenty of customers that
would gladly trade some of the friendly management features for better
i/o performance.
Therefore I think it is acceptable to take it with some documentation that
it is experimental. Maybe I’m unique but I deal with people pushing for
better performance all the time.
Vish
>
> Regards,
> Daniel
> --
> |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org :|
> |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140724/4f685e2c/attachment.pgp>
More information about the OpenStack-dev
mailing list