[Openstack-operators] Experience with Cinder volumes as root disks?

Mike Lowe jomlowe at iu.edu
Tue Aug 1 20:32:44 UTC 2017


Two things, first info does not show how much disk is used du does.  Second, the semantics count, copy is different than clone and flatten.  Clone and flatten which should happen if you have things working correctly is much faster than copy.  If you are using copy then you may be limited by the number of management ops in flight, this is a setting for more recent versions of ceph.  I don’t know if copy skips zero byte objects but clone and flatten certainly do.  You need to be sure that you have the proper settings in nova.conf for discard/unmap as well as using hw_scsi_model=virtio-scsi and hw_disk_bus=scsi in the image properties.  Once discard is working and you have the qemu guest agent running in your instances you can force them to do a fstrim to reclaim space as an additional benefit.

> On Aug 1, 2017, at 3:50 PM, John Petrini <jpetrini at coredial.com> wrote:
> 
> Maybe I'm just not understanding but when I create a nova snapshot the snapshot happens at RBD in the ephemeral pool and then it's copied to the images pool. This results in a full sized image rather than a snapshot with a reference to the parent.
> 
> For example below is a snapshot of an ephemeral instance from our images pool. It's 80GB, the size of the instance, so rather than just capturing the state of the parent image I end up with a brand new image of the same size. It takes a long time to create this copy and causes high IO during the snapshot.
> 
> rbd --pool images info d5404709-cb86-4743-b3d5-1dc7fba836c1
> rbd image 'd5404709-cb86-4743-b3d5-1dc7fba836c1':
> 	size 81920 MB in 20480 objects
> 	order 22 (4096 kB objects)
> 	block_name_prefix: rbd_data.93cdd43ca5efa8
> 	format: 2
> 	features: layering, striping
> 	flags: 
> 	stripe unit: 4096 kB
> 	stripe count: 1
> 
> 
> John Petrini
> 
> 
> On Tue, Aug 1, 2017 at 3:24 PM, Mike Lowe <jomlowe at iu.edu <mailto:jomlowe at iu.edu>> wrote:
> There is no upload if you use Ceph to back your glance (like you should), the snapshot is cloned from the ephemeral pool into the the images pool, then flatten is run as a background task.  Net result is that creating a 120GB image vs 8GB is slightly faster on my cloud but not at all what I’d call painful.
> 
> Running nova image-create for a 8GB image:
> 
> real	0m2.712s
> user	0m0.761s
> sys	0m0.225s
> 
> Running nova image-create for a 128GB image:
> 
> real	0m2.436s
> user	0m0.774s
> sys	0m0.225s
> 
> 
> 
> 
>> On Aug 1, 2017, at 3:07 PM, John Petrini <jpetrini at coredial.com <mailto:jpetrini at coredial.com>> wrote:
>> 
>> Yes from Mitaka onward the snapshot happens at the RBD level which is fast. It's the flattening and uploading of the image to glance that's the major pain point. Still it's worlds better than the qemu snapshots to the local disk prior to Mitaka.
>> 
>> John Petrini
>> 
>> Platforms Engineer   //   CoreDial, LLC   //   coredial.com <http://coredial.com/>   //    <https://twitter.com/coredial>    <http://www.linkedin.com/company/99631>    <https://plus.google.com/104062177220750809525/posts>    <http://success.coredial.com/blog> 
>> 751 Arbor Way, Hillcrest I, Suite 150, Blue Bell, PA 19422
>> P: 215.297.4400 x232 <tel:(215)%20297-4400>   //   F: 215.297.4401 <tel:(215)%20297-4401>   //   E: jpetrini at coredial.com <mailto:jpetrini at coredial.com>
>>  <https://t.xink.io/Tracking/Index/a64BAGORAACukSoA0>
>> 
>> The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission,  dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
>> 
>>  
>> 
>> On Tue, Aug 1, 2017 at 2:53 PM, Mike Lowe <jomlowe at iu.edu <mailto:jomlowe at iu.edu>> wrote:
>> Strictly speaking I don’t think this is the case anymore for Mitaka or later.  Snapping nova does take more space as the image is flattened, but the dumb download then upload back into ceph has been cut out.  With careful attention paid to discard/TRIM I believe you can maintain the thin provisioning properties of RBD.  The workflow is explained here.  https://www.sebastien-han.fr/blog/2015/10/05/openstack-nova-snapshots-on-ceph-rbd/ <https://www.sebastien-han.fr/blog/2015/10/05/openstack-nova-snapshots-on-ceph-rbd/>
>> 
>>> On Aug 1, 2017, at 11:14 AM, John Petrini <jpetrini at coredial.com <mailto:jpetrini at coredial.com>> wrote:
>>> 
>>> Just my two cents here but we started out using mostly Ephemeral storage in our builds and looking back I wish we hadn't. Note we're using Ceph as a backend so my response is tailored towards Ceph's behavior.
>>> 
>>> The major pain point is snapshots. When you snapshot an nova volume an RBD snapshot occurs and is very quick and uses very little additional storage, however the snapshot is then copied into the images pool and in the process is converted from a snapshot to a full size image. This takes a long time because you have to copy a lot of data and it takes up a lot of space. It also causes a great deal of IO on the storage and means you end up with a bunch of "snapshot images" creating clutter. On the other hand volume snapshots are near instantaneous without the other drawbacks I've mentioned.
>>> 
>>> On the plus side for ephemeral storage; resizing the root disk of images works better. As long as your image is configured properly it's just a matter of initiating a resize and letting the instance reboot to grow the root disk. When using volumes as your root disk you instead have to shutdown the instance, grow the volume and boot.
>>> 
>>> I hope this help! If anyone on the list knows something I don't know regarding these issues please chime in. I'd love to know if there's a better way.
>>> 
>>> Regards,
>>> John Petrini
>>> 
>>> 
>>> On Tue, Aug 1, 2017 at 10:50 AM, Kimball, Conrad <conrad.kimball at boeing.com <mailto:conrad.kimball at boeing.com>> wrote:
>>> In our process of standing up an OpenStack internal cloud we are facing the question of ephemeral storage vs. Cinder volumes for instance root disks.
>>> 
>>>  
>>> 
>>> As I look at public clouds such as AWS and Azure, the norm is to use persistent volumes for the root disk.  AWS started out with images booting onto ephemeral disk, but soon after they released Elastic Block Storage and ever since the clear trend has been to EBS-backed instances, and now when I look at their quick-start list of 33 AMIs, all of them are EBS-backed.  And I’m not even sure one can have anything except persistent root disks in Azure VMs.
>>> 
>>>  
>>> 
>>> Based on this and a number of other factors I think we want our user normal / default behavior to boot onto Cinder-backed volumes instead of onto ephemeral storage.  But then I look at OpenStack and its design point appears to be booting images onto ephemeral storage, and while it is possible to boot an image onto a new volume this is clumsy (haven’t found a way to make this the default behavior) and we are experiencing performance problems (that admittedly we have not yet run to ground).
>>> 
>>>  
>>> 
>>> So …
>>> 
>>> ·         Are other operators routinely booting onto Cinder volumes instead of ephemeral storage?
>>> 
>>> ·         What has been your experience with this; any advice?
>>> 
>>>  
>>> 
>>> Conrad Kimball
>>> 
>>> Associate Technical Fellow
>>> 
>>> Chief Architect, Enterprise Cloud Services
>>> 
>>> Application Infrastructure Services / Global IT Infrastructure / Information Technology & Data Analytics
>>> 
>>> conrad.kimball at boeing.com <mailto:conrad.kimball at boeing.com>
>>> P.O. Box 3707, Mail Code 7M-TE
>>> 
>>> Seattle, WA  98124-2207
>>> 
>>> Bellevue 33-11 bldg, office 3A6-3.9
>>> 
>>> Mobile:  425-591-7802 <tel:(425)%20591-7802>
>>>  
>>> 
>>> 
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org <mailto:OpenStack-operators at lists.openstack.org>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>>> 
>>> 
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org <mailto:OpenStack-operators at lists.openstack.org>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170801/9a59fd91/attachment.html>


More information about the OpenStack-operators mailing list