[Openstack] [NOVA] Snapshotting may require significant disk space (in /tmp). How to properly solve disk space issues?

Justin Santa Barbara justin at fathomdb.com
Sat Mar 17 01:50:08 UTC 2012


We're creating a (huge) temp file, uploading it, and then deleting it.  So
really we should be streaming the snapshot direct to the destination
(glance?)

Checking the code, we are writing it sequentially (particularly if we're
writing in raw):
https://github.com/qemu/QEMU/blob/master/qemu-img.c


But there's more...

> qemu-img --help
qemu-img version 1.0, Copyright (c) 2004-2008 Fabrice Bellard
...
Supported formats: vvfat vpc vmdk vdi *sheepdog* *rbd* raw host_cdrom
host_floppy host_device file qed qcow2 qcow parallels *nbd iscsi* dmg *tftp
ftps ftp https http* cow cloop bochs blkverify blkdebug


So it looks like we really want a "Supported format: glance" there
(particularly as there's already http support in block/curl.c) :-)  I guess
we could then even do crazy things like booting direct from glance?

Or, if we don't want to get back into C, we could at least optimize the
case where glance is backed by Ceph, and stream direct to a Ceph file, and
then hand that file to Glance.

Justin





On Fri, Mar 16, 2012 at 9:11 AM, Jay Pipes <jaypipes at gmail.com> wrote:

> Hi Stackers,
>
> So, in diagnosing a few things on TryStack yesterday, I ran into an
> interesting problem with snapshotting that I'm hoping to get some advice on.
>
> == The Problem ==
>
> The TryStack codebase is Diablo, however the code involved in this
> particular problem I believe is the same in Essex...
>
> The issue that was happening was a user was attempting to snapshot a tiny
> instance (512MB/1-core) through the dashboard. The dashboard returned and
> noted that a snapshot was created and was in Queued status.
>
> The snapshot never goes out of Queued status, and so I logged into the
> compute node that housed the instance in question to see if I could figure
> out what was going on.
>
> Grepping through the compute log, I found the following:
>
> (nova.rpc): TRACE: Traceback (most recent call last):
> (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-**
> packages/nova/rpc/impl_kombu.**py", line 628, in _process_data
> (nova.rpc): TRACE:     rval = node_func(context=ctxt, **node_args)
> (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-**packages/nova/exception.py",
> line 100, in wrapped
> (nova.rpc): TRACE:     return f(*args, **kw)
> (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-**
> packages/nova/compute/manager.**py", line 687, in snapshot_instance
> (nova.rpc): TRACE:     self.driver.snapshot(context, instance_ref,
> image_id)
> (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-**packages/nova/exception.py",
> line 100, in wrapped
> (nova.rpc): TRACE:     return f(*args, **kw)
> (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-**
> packages/nova/virt/libvirt/**connection.py", line 479, in snapshot
> (nova.rpc): TRACE:     utils.execute(*qemu_img_cmd)
> (nova.rpc): TRACE:   File "/usr/lib/python2.7/dist-**packages/nova/utils.py",
> line 190, in execute
> (nova.rpc): TRACE:     cmd=' '.join(cmd))
> (nova.rpc): TRACE: ProcessExecutionError: Unexpected error while running
> command.
> (nova.rpc): TRACE: Command: qemu-img convert -f qcow2 -O raw -s
> e7ba4fb5f6f04f99b07d1d222ada02**19 /opt/openstack/nova/instances/**instance-00000548/disk
> /tmp/tmpIuOQo0/**e7ba4fb5f6f04f99b07d1d222ada02**19
> (nova.rpc): TRACE: Exit code: 1
> (nova.rpc): TRACE: Stdout: ''
> (nova.rpc): TRACE: Stderr: 'qemu-img: error while writing\n'
>
> QEMU was unhelpfully returning a vague error message of "error while
> writing".
>
> It turned out, after speaking with a couple folks on IRC (thx vishy and
> rmk!) that the snapshot process (qemu-img convert ... above) is storing the
> output of the process (the snapshot) in a temporary directory created using
> tempfile.mkdtemp() in the nova/virt/libvirt/connection.**py file.
>
> As it turns out, the base operating system we install on our compute nodes
> in TryStack has a (very) small root partition -- only 2GB in size (we use
> the devstack build_pxe_env.sh script to create the base Ubuntu image that
> is netbooted on the compute nodes.
>
> Looking at the free disk space on the compute node in question, the
> problem was apparent:
>
> root at freecloud102:/var/log/**nova# df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/ram0             2.0G  1.4G  535M  73% /
> devtmpfs               48G  240K   48G   1% /dev
> none                   48G     0   48G   0% /dev/shm
> none                   48G  212K   48G   1% /var/run
> none                   48G     0   48G   0% /var/lock
> /dev/md0              5.4T   93G  5.1T   2% /opt/openstack
>
> There simply isn't enough free space on the root partition (which is where
> /tmp is housed) for the snapshot to be created.
>
> == Possible Solutions ==
>
> So, there are a number of solutions that we can work on here, and I'm
> wondering what the preference would be. Here are the solutions I have come
> up with, along with a no-brainer improvement to Nova that would help in
> diagnosing this problem:
>
> The no-brainer: Detect before attempting a snapshot that there is enough
> space on a device to perform the operation, and if not, throw a useful
> error message up the stack
>
> Solutions to the disk space problem:
>
> (1) Silly Jay, change the damn size of the root partition in your PXE base
> OS install!
>
> Now, I'm no expert in creating customized base disk images, but from
> looking at the build_pxe_env.sh script in devstack [1], it seems pretty
> trivial to change the ramdisk_size parameter in the startup options to
> something larger than 2109600. We could do this and reimage the compute
> nodes one by one.
>
> (2) Make the location in which the snapshot is made configurable.
>
> Right now, as mentioned above, tempfile.mkdtemp() is used, which creates a
> directory in the user's TMPDIR (typically /tmp, which is usually on the
> root partition).
>
> We could add an option (--libvirt-snapshot-dir?) that would allow
> nova-compute to override where that snapshot is built.
>
> (3) Change the user (running nova-compute) TMPDIR setting to something
> different than /tmp on the root partition).
>
> Thoughts?
> -jay
>
> [1] https://github.com/openstack-**dev/devstack/blob/stable/**
> diablo/tools/build_pxe_env.sh<https://github.com/openstack-dev/devstack/blob/stable/diablo/tools/build_pxe_env.sh>
>
> ______________________________**_________________
> Mailing list: https://launchpad.net/~**openstack<https://launchpad.net/~openstack>
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~**openstack<https://launchpad.net/~openstack>
> More help   : https://help.launchpad.net/**ListHelp<https://help.launchpad.net/ListHelp>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120316/0c4c241d/attachment.html>


More information about the Openstack mailing list