Open Stack

Thu May 31 22:14:42 UTC 2018

Hello.

We recently upgraded from Liberty to Pike and looking ahead to the code in Queens, noticed the image download deprecation notice with instructions to post here if this interface was in use.  As such, I'd like to explain our use case and see if there is a better way of accomplishing our goal or lobby for the "un-deprecation" of this extension point.

As with many installations, we are using Ceph for both our Glance image store and VM instance disks.  In a normal workflow when both Glance and libvirt are configured to use Ceph, libvirt reacts to the direct_url field on the Glance image and performs an in-place clone of the RAW disk image from the images pool into the vms pool all within Ceph.  The snapshot creation process is very fast and is thinly provisioned as it's a COW snapshot.

This underlying workflow itself works great, the issue is with performance of the VM's disk within Ceph, especially as the number of nodes within the cluster grows.  We have found, especially with Windows VMs (largely as a result of I/O for the Windows pagefile), that the performance of the Ceph cluster as a whole takes a very large hit in keeping up with all of this I/O thrashing, especially when Windows is booting.  This is not the case with Linux VMs as they do not use swap as frequently as do Windows nodes with their pagefiles.  Windows can be run without a pagefile but that leads to other odditites within Windows.

I should also mention that in our case, the nodes themselves are ephemeral and we do not care about live migration, etc., we just want raw performance.

As an aside on our Ceph setup without getting into too many details, we have very fast SSD based Ceph nodes for this pool (separate crush root, SSDs for both OSD and journals, 2 replicas), interconnected on the same switch backplane, each with bonded 10GB uplinks to the switch.  Our Nova nodes are within the same datacenter (also have bonded 10GB uplinks to their switches) but are distributed across different switches.  We could move the Nova nodes to the same switch as the Ceph nodes but that is a larger logistical challenge to rearrange many servers to make space.

Back to our use case, in order to isolate this heavy I/O, a subset of our compute nodes have a local SSD and are set to use qcow2 images instead of rbd so that libvirt will pull the image down from Glance into the node's local image cache and run the VM from the local SSD.  This allows Windows VMs to boot and perform their initial cloudbase-init setup/reboot within ~20 sec vs 4-5 min, regardless of overall Ceph cluster load.  Additionally, this prevents us from "wasting" IOPS and instead keep them local to the Nova node, reclaiming the network bandwidth and Ceph IOPS for use by Cinder volumes.  This is essentially the use case outlined here in the "Do designate some non-Ceph compute hosts with low-latency local storage" section:

https://ceph.com/planet/the-dos-and-donts-for-ceph-for-openstack/

The challenge is that transferring the Glance image transfer is _glacially slow_ when using the Glance HTTP API (~30 min for a 50GB Windows image (It's Windows, it's huge with all of the necessary tools installed)).  If libvirt can instead perform an RBD export on the image using the image download functionality, it is able to download the same image in ~30 sec.  We have code that is performing the direct download from Glance over RBD and it works great in our use case which is very similar to the code in this older patch:

https://review.openstack.org/#/c/44321/

We could look at attaching an additional ephemeral disk to the instance and have cloudbase-init use it as the pagefile but it appears that if libvirt is using rbd for its images_type, _all_ disks must then come from Ceph, there is no way at present to allow the VM image to run from Ceph and have an ephemeral disk mapped in from node-local storage.  Even still, this would have the effect of "wasting" Ceph IOPS for the VM disk itself which could be better used for other purposes.

Based on what I have explained about our use case, is there a better/different way to accomplish the same goal without using the deprecated image download functionality?  If not, can we work to "un-deprecate" the download extension point?  Should I work to get the code for this RBD download into the upstream repository?

Thanks,

-Curt

________________________________

CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be Garmin confidential and/or Garmin legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication (including attachments) by someone other than the intended recipient is prohibited. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180531/d7b590de/attachment.html>

Open Stack

[openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

OpenStack

Community

Documentation

Branding & Legal