AW: [nova][glance][cinder] How to do consistent snapshots with quemu-guest-agent

Teckelmann, Ralf, NMU-OIP ralf.teckelmann at bertelsmann.de
Mon Aug 26 06:55:42 UTC 2019


Hi Florian,

Thanks for moving our conversation to the mailing list.

From the discussion on the ceph-mailing list I take (as of today):

- Ephemeral boot, *without* RBD, with or without attached volumes:
freeze/thaw if hw_qemu_guest_agent=yes, resulting in consistent snapshots.

- Ephemeral boot *from* RBD, also with or without attached volumes: no
freeze/thaw, resulting in potentially inconsistent snapshots even with
hw_qemu_guest_agent=yes.

- Boot-from-volume from RBD: freeze/thaw if hw_qemu_guest_agent=yes,
resulting in consistent snapshots.

One may note that " hw_qemu_guest_agent=yes" stands for 
- the metadata property set on an image and as well 
- assumes one does install the qemu-guest agent in that image or on the instances spawned from that image.
Besides that, Florian explains further down why os_require_quiesce=yes is very nice to have set as well.

I am fine with the result of your in depth analysis, Florian.

Thanks a lot,

Ralf T.

-----Ursprüngliche Nachricht-----
Von: Florian Haas <florian at citynetwork.eu> 
Gesendet: Mittwoch, 21. August 2019 21:09
An: Teckelmann, Ralf, NMU-OIP <ralf.teckelmann at bertelsmann.de>
Cc: openstack-discuss at lists.openstack.org
Betreff: [nova][glance][cinder] How to do consistent snapshots with quemu-guest-agent

[apologies for the top-post]

Hi Ralf,

it looks like you've met all the necessary prerequisites. Basically,

1. The image you are booting from must have the hw_qemu_guest_agent=yes property set (this configures the Nova instance with a virtual serial device consumed by nova-guest-agent).

2. The instance must run the qemu-guest-agent daemon.

3. The image you are booting from should have the os_require_quiesce=yes property set. This isn't strictly necessary, as libvirt should always try to send the freeze/thaw commands over the serial device if your instance is configured with hw_qemu_guest_agent — but if os_require_quiesce is set then the snapshot will actually fail if libvirt can't freeze, which is what you probably want.

4. The filesystem used within the guest must support fsfreeze. This includes btrfs, ext2/3/4, and xfs, and a few others. vfat on Linux does not support being frozen, though Windows guests with the Windows Qemu Guest Agent apparently do support freezing if VSS is enabled — I am no expert on Windows guests though.

What happens under the covers is that qemu-guest-agent invokes the FIFREEZE ioctl on each mounted filesystem in the guest, as seen here:

https://urldefense.proofpoint.com/v2/url?u=https-3A__git.qemu.org_-3Fp-3Dqemu.git-3Ba-3Dblob-3Bf-3Dqga_commands-2Dposix.c-23l1327&d=DwIFaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=aXb_NFqLbO-31tpiN1sfnfeMAjINAL_ebQhZf6tKDjI&s=lItytJ_9XhF-gY8aAJmXNJ5VeoZB-grifLw6GZWEuuc&e=
(the comments immediately above that line explain under which circumstances the FIFREEZE ioctl may fail).

The FIFREEZE ioctl maps to the kernel freeze_super() function, which flushes the filesystem superblock, syncs the filesystem, and then disallows any further I/O. Which, to answer your other question, should indeed persist all in-flight I/O to disk. Unfortunately, nothing in the code path (that I know of) issues any printk's on success, so dmesg won't tell you that the filesystem has been flushed/frozen successfully.
You'd only see "VFS:Filesystem freeze failed" in your guest's kernel log on error. The same is true for FITHAW/thaw_super(), which thaws the superblock and makes the filesystem writable again.

However, you can (at least on an Ubuntu guest), create a file named /etc/default/qemu-guest-agent, in which you can define DAEMON_ARGS like
this:

DAEMON_ARGS="--logfile /var/log/qemu-ga.log --verbose"

Then, while you are creating a snapshot with "nova image-create" or "openstack server image create", /var/log/qemu-ga.log should be populated with log entries related to the fsfreeze events. The same should be true for creating a snapshot from Horizon.

On Ubuntu bionic, you should also make sure that you are running qemu-guest-agent from bionic-security (or a recent daily build of an Ubuntu cloud image), because at least in the initial bionic release qemu-guest-agent was suffering from a packaging issue, described in https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_ubuntu_-2Bsource_qemu_-2Bbug_1820291&d=DwIFaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=aXb_NFqLbO-31tpiN1sfnfeMAjINAL_ebQhZf6tKDjI&s=3oc3rufQlE2cF6qyT_4UgHW0ouD2muOXDdkDJDgfVvk&e=.

For RBD-backed Nova/libvirt, things are a bit more complicated still, due to what appears like somewhat inconsistent/unexpected behavior in Nova. See the discussion in:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.ceph.io_hyperkitty_list_ceph-2Dusers-40ceph.io_thread_3YQCRO4JP56EDJN5KX5DWW5N2CSBHRHZ_&d=DwIFaQ&c=vo2ie5TPcLdcgWuLVH4y8lsbGPqIayH3XbK3gK82Oco&r=WXex93lsaiQ-z7CeZkHv93lzt4fdCRIPXloSPQEU7CM&m=aXb_NFqLbO-31tpiN1sfnfeMAjINAL_ebQhZf6tKDjI&s=U0H46fjutEaPYP8DAR0Y05nbILShUGD7z6-mZxaNmo0&e=

Does this give you enough information so you can verify whether or not freeze/thaw is working as expected for you?

Cheers,
Florian


On 14/08/2019 10:41, Teckelmann, Ralf, NMU-OIP wrote:
> Hello,
> 
> 
> Working me through documentation and articles I am totally lost on the 
> matter.
> 
> All I want to know is:
> 
> - if issueing "openstack snapshot create ...."
> 
> - if klicking "create Snaphost" in Horizon for an instance
> 
> will secure a consistent snapshot (of all volumes in question).
> With "consistent", I mean that all the data in memory are written to 
> the disc before starting a snapshot.
> 
> I hope someone can clear up, if using the setup described in the 
> following is sufficient to achieve this goal or if I have to do 
> something in addition.
> 
> 
> If you have any question I am eager to answer as fast as possible.
> 
> 
> Setup:
> 
> 
> We have a Stein-based OpenStack deployment with cinder backed by ceph.
> 
> Instances are created with cinder volumes. Boot volumes are based on 
> an image having the properties:
> 
> - hw_qemu_guest_agent='yes'
> - os_require_quiesce='yes'
> 
> 
> The image is ubuntu 16.04 or 18.04 with quemu-guest-agent package 
> installed and service running (no additional configuration besides
> distro-default):
> 
> 
> qemu-guest-agent.service - LSB: QEMU Guest Agent startup script
>    Loaded: loaded (/etc/init.d/qemu-guest-agent; bad; vendor preset:
> enabled)
>    Active: active (running) since Wed 2019-08-14 07:42:21 UTC; 9min 
> ago
>      Docs: man:systemd-sysv-generator(8)
>    CGroup: /system.slice/qemu-guest-agent.service
>            └─2300 /usr/sbin/qemu-ga --daemonize -m virtio-serial -p
> /dev/virtio-ports/org.qemu.guest_agent.0
> 
> Aug 14 07:42:21 ulthwe systemd[1]: Starting LSB: QEMU Guest Agent 
> startup script...
> Aug 14 07:42:21 ulthwe systemd[1]: Started LSB: QEMU Guest Agent 
> startup script.
> 
> I can see the socket on the compute node and send pings successfully:
> 
> ~# ls /var/lib/libvirt/qemu/*.sock
> /var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-0000248e.sock
> root at pcevh2404:~# virsh qemu-agent-command instance-0000248e 
> '{"execute":"guest-ping"}'
> {"return":{}}
> 
> 
> I can also send freeze and thaw successfully:
> 
> ~# virsh qemu-agent-command instance-0000248e 
> '{"execute":"guest-fsfreeze-freeze"}'
> {"return":1}
> 
> ~# virsh qemu-agent-command instance-0000248e 
> '{"execute":"guest-fsfreeze-thaw"}'
> {"return":1}
> 
> Sending a simple write (echo "bla" > blub.file) in the "frozen" state 
> will be blocked until "thaw" as expected.
> 
> Best regards
> 
> 
> Ralf T.


More information about the openstack-discuss mailing list