[Openstack-operators] Experience with Cinder volumes as root disks?

John Petrini jpetrini at coredial.com
Tue Aug 1 19:50:35 UTC 2017


Maybe I'm just not understanding but when I create a nova snapshot the
snapshot happens at RBD in the ephemeral pool and then it's copied to the
images pool. This results in a full sized image rather than a snapshot with
a reference to the parent.

For example below is a snapshot of an ephemeral instance from our images
pool. It's 80GB, the size of the instance, so rather than just capturing
the state of the parent image I end up with a brand new image of the same
size. It takes a long time to create this copy and causes high IO during
the snapshot.

rbd --pool images info d5404709-cb86-4743-b3d5-1dc7fba836c1
rbd image 'd5404709-cb86-4743-b3d5-1dc7fba836c1':
size 81920 MB in 20480 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.93cdd43ca5efa8
format: 2
features: layering, striping
flags:
stripe unit: 4096 kB
stripe count: 1


John Petrini

On Tue, Aug 1, 2017 at 3:24 PM, Mike Lowe <jomlowe at iu.edu> wrote:

> There is no upload if you use Ceph to back your glance (like you should),
> the snapshot is cloned from the ephemeral pool into the the images pool,
> then flatten is run as a background task.  Net result is that creating a
> 120GB image vs 8GB is slightly faster on my cloud but not at all what I’d
> call painful.
>
> Running nova image-create for a 8GB image:
>
> real 0m2.712s
> user 0m0.761s
> sys 0m0.225s
>
> Running nova image-create for a 128GB image:
>
> real 0m2.436s
> user 0m0.774s
> sys 0m0.225s
>
>
>
>
> On Aug 1, 2017, at 3:07 PM, John Petrini <jpetrini at coredial.com> wrote:
>
> Yes from Mitaka onward the snapshot happens at the RBD level which is
> fast. It's the flattening and uploading of the image to glance that's the
> major pain point. Still it's worlds better than the qemu snapshots to the
> local disk prior to Mitaka.
>
> John Petrini
>
> Platforms Engineer   //   *CoreDial, LLC*   //   coredial.com   //   [image:
> Twitter] <https://twitter.com/coredial>   [image: LinkedIn]
> <http://www.linkedin.com/company/99631>   [image: Google Plus]
> <https://plus.google.com/104062177220750809525/posts>   [image: Blog]
> <http://success.coredial.com/blog>
> 751 Arbor Way, Hillcrest I, Suite 150, Blue Bell, PA 19422
> *P:* 215.297.4400 x232 <(215)%20297-4400>   //   *F: *215.297.4401
> <(215)%20297-4401>   //   *E: *jpetrini at coredial.com
> <https://t.xink.io/Tracking/Index/a64BAGORAACukSoA0>
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission,  dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from any
> computer.
>
>
>
> On Tue, Aug 1, 2017 at 2:53 PM, Mike Lowe <jomlowe at iu.edu> wrote:
>
>> Strictly speaking I don’t think this is the case anymore for Mitaka or
>> later.  Snapping nova does take more space as the image is flattened, but
>> the dumb download then upload back into ceph has been cut out.  With
>> careful attention paid to discard/TRIM I believe you can maintain the thin
>> provisioning properties of RBD.  The workflow is explained here.
>> https://www.sebastien-han.fr/blog/2015/10/05/openstack-nova
>> -snapshots-on-ceph-rbd/
>>
>> On Aug 1, 2017, at 11:14 AM, John Petrini <jpetrini at coredial.com> wrote:
>>
>> Just my two cents here but we started out using mostly Ephemeral storage
>> in our builds and looking back I wish we hadn't. Note we're using Ceph as a
>> backend so my response is tailored towards Ceph's behavior.
>>
>> The major pain point is snapshots. When you snapshot an nova volume an
>> RBD snapshot occurs and is very quick and uses very little additional
>> storage, however the snapshot is then copied into the images pool and in
>> the process is converted from a snapshot to a full size image. This takes a
>> long time because you have to copy a lot of data and it takes up a lot of
>> space. It also causes a great deal of IO on the storage and means you end
>> up with a bunch of "snapshot images" creating clutter. On the other hand
>> volume snapshots are near instantaneous without the other drawbacks I've
>> mentioned.
>>
>> On the plus side for ephemeral storage; resizing the root disk of images
>> works better. As long as your image is configured properly it's just a
>> matter of initiating a resize and letting the instance reboot to grow the
>> root disk. When using volumes as your root disk you instead have to
>> shutdown the instance, grow the volume and boot.
>>
>> I hope this help! If anyone on the list knows something I don't know
>> regarding these issues please chime in. I'd love to know if there's a
>> better way.
>>
>> Regards,
>>
>> John Petrini
>>
>> On Tue, Aug 1, 2017 at 10:50 AM, Kimball, Conrad <
>> conrad.kimball at boeing.com> wrote:
>>
>>> In our process of standing up an OpenStack internal cloud we are facing
>>> the question of ephemeral storage vs. Cinder volumes for instance root
>>> disks.
>>>
>>>
>>>
>>> As I look at public clouds such as AWS and Azure, the norm is to use
>>> persistent volumes for the root disk.  AWS started out with images booting
>>> onto ephemeral disk, but soon after they released Elastic Block Storage and
>>> ever since the clear trend has been to EBS-backed instances, and now when I
>>> look at their quick-start list of 33 AMIs, all of them are EBS-backed.  And
>>> I’m not even sure one can have anything except persistent root disks in
>>> Azure VMs.
>>>
>>>
>>>
>>> Based on this and a number of other factors I think we want our user
>>> normal / default behavior to boot onto Cinder-backed volumes instead of
>>> onto ephemeral storage.  But then I look at OpenStack and its design point
>>> appears to be booting images onto ephemeral storage, and while it is
>>> possible to boot an image onto a new volume this is clumsy (haven’t found a
>>> way to make this the default behavior) and we are experiencing performance
>>> problems (that admittedly we have not yet run to ground).
>>>
>>>
>>>
>>> So …
>>>
>>> ·         Are other operators routinely booting onto Cinder volumes
>>> instead of ephemeral storage?
>>>
>>> ·         What has been your experience with this; any advice?
>>>
>>>
>>>
>>> *Conrad Kimball*
>>>
>>> Associate Technical Fellow
>>>
>>> Chief Architect, Enterprise Cloud Services
>>>
>>> Application Infrastructure Services / Global IT Infrastructure /
>>> Information Technology & Data Analytics
>>>
>>> conrad.kimball at boeing.com
>>>
>>> P.O. Box 3707, Mail Code 7M-TE
>>>
>>> Seattle, WA  98124-2207
>>>
>>> Bellevue 33-11 bldg, office 3A6-3.9
>>>
>>> Mobile:  425-591-7802 <(425)%20591-7802>
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170801/4ebeb606/attachment.html>


More information about the OpenStack-operators mailing list