Open Stack

Mon Jun 4 15:53:43 UTC 2018

+openstack-operators to see if others have the same use case

On 5/31/2018 5:14 PM, Moore, Curt wrote:
> We recently upgraded from Liberty to Pike and looking ahead to the code 
> in Queens, noticed the image download deprecation notice with 
> instructions to post here if this interface was in use.  As such, I’d 
> like to explain our use case and see if there is a better way of 
> accomplishing our goal or lobby for the "un-deprecation" of this 
> extension point.

Thanks for speaking up - this is much easier *before* code is removed.

> 
> As with many installations, we are using Ceph for both our Glance image 
> store and VM instance disks.  In a normal workflow when both Glance and 
> libvirt are configured to use Ceph, libvirt reacts to the direct_url 
> field on the Glance image and performs an in-place clone of the RAW disk 
> image from the images pool into the vms pool all within Ceph.  The 
> snapshot creation process is very fast and is thinly provisioned as it’s 
> a COW snapshot.
> 
> This underlying workflow itself works great, the issue is with 
> performance of the VM’s disk within Ceph, especially as the number of 
> nodes within the cluster grows.  We have found, especially with Windows 
> VMs (largely as a result of I/O for the Windows pagefile), that the 
> performance of the Ceph cluster as a whole takes a very large hit in 
> keeping up with all of this I/O thrashing, especially when Windows is 
> booting.  This is not the case with Linux VMs as they do not use swap as 
> frequently as do Windows nodes with their pagefiles.  Windows can be run 
> without a pagefile but that leads to other odditites within Windows.
> 
> I should also mention that in our case, the nodes themselves are 
> ephemeral and we do not care about live migration, etc., we just want 
> raw performance.
> 
> As an aside on our Ceph setup without getting into too many details, we 
> have very fast SSD based Ceph nodes for this pool (separate crush root, 
> SSDs for both OSD and journals, 2 replicas), interconnected on the same 
> switch backplane, each with bonded 10GB uplinks to the switch.  Our Nova 
> nodes are within the same datacenter (also have bonded 10GB uplinks to 
> their switches) but are distributed across different switches.  We could 
> move the Nova nodes to the same switch as the Ceph nodes but that is a 
> larger logistical challenge to rearrange many servers to make space.
> 
> Back to our use case, in order to isolate this heavy I/O, a subset of 
> our compute nodes have a local SSD and are set to use qcow2 images 
> instead of rbd so that libvirt will pull the image down from Glance into 
> the node’s local image cache and run the VM from the local SSD.  This 
> allows Windows VMs to boot and perform their initial cloudbase-init 
> setup/reboot within ~20 sec vs 4-5 min, regardless of overall Ceph 
> cluster load.  Additionally, this prevents us from "wasting" IOPS and 
> instead keep them local to the Nova node, reclaiming the network 
> bandwidth and Ceph IOPS for use by Cinder volumes.  This is essentially 
> the use case outlined here in the "Do designate some non-Ceph compute 
> hosts with low-latency local storage" section:
> 
> https://ceph.com/planet/the-dos-and-donts-for-ceph-for-openstack/
> 
> The challenge is that transferring the Glance image transfer is 
> _glacially slow_ when using the Glance HTTP API (~30 min for a 50GB 
> Windows image (It’s Windows, it’s huge with all of the necessary tools 
> installed)).  If libvirt can instead perform an RBD export on the image 
> using the image download functionality, it is able to download the same 
> image in ~30 sec.  We have code that is performing the direct download 
> from Glance over RBD and it works great in our use case which is very 
> similar to the code in this older patch:
> 
> https://review.openstack.org/#/c/44321/

It looks like at the time this had general approval (i.e. it wasn't 
considered crazy) but was blocked simply due to the Havana feature 
freeze. That's good to know.

> 
> We could look at attaching an additional ephemeral disk to the instance 
> and have cloudbase-init use it as the pagefile but it appears that if 
> libvirt is using rbd for its images_type, _all_ disks must then come 
> from Ceph, there is no way at present to allow the VM image to run from 
> Ceph and have an ephemeral disk mapped in from node-local storage.  Even 
> still, this would have the effect of "wasting" Ceph IOPS for the VM disk 
> itself which could be better used for other purposes.

When you mentioned the swap above I was thinking similar to this, 
attaching a swap device but as you've pointed out, all disks local to 
the compute host are going to use the same image type backend, so you 
can't have the root disk and swap/ephemeral disks using different image 
backends.

> 
> Based on what I have explained about our use case, is there a 
> better/different way to accomplish the same goal without using the 
> deprecated image download functionality?  If not, can we work to 
> "un-deprecate" the download extension point? Should I work to get the 
> code for this RBD download into the upstream repository?
> 

I think you should propose your changes upstream with a blueprint, the 
docs for the blueprint process are here:

https://docs.openstack.org/nova/latest/contributor/blueprints.html

Since it's not an API change, this might just be a specless blueprint, 
but you'd need to write up the blueprint and probably post the PoC code 
to Gerrit and then bring it up during the "Open Discussion" section of 
the weekly nova meeting.

Once we can take a look at the code change, we can go from there on 
whether or not to add that in-tree or go some alternative route.

Until that happens, I think we'll just say we won't remove that 
deprecated image download extension code, but that's not going to be an 
unlimited amount of time if you don't propose your changes upstream.

Is there going to be anything blocking or slowing you down on your end 
with regard to contributing this change, like legal approval, license 
agreements, etc? If so, please be up front about that.

-- 

Thanks,

Matt

Open Stack

[Openstack-operators] [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

OpenStack

Community

Documentation

Branding & Legal