Ceph outage cause filesystem error on VM
Folks,
I am running a small 3 node compute/controller with 3 node ceph storage in my lab. Yesterday, because of a power outage all my nodes went down. After reboot of all nodes ceph seems to show good health and no error (in ceph -s).
When I started using the existing VM I noticed the following errors. Seems like data loss. This is a lab machine and has zero activity on vms but still loses data and the file system corrupt. Is this normal ?
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE) flags 0x800 phys_seg 8 prio class 0
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph storage in my lab. Yesterday, because of a power outage all my nodes went down. After reboot of all nodes ceph seems to show good health and no error (in ceph -s).
When I started using the existing VM I noticed the following errors. Seems like data loss. This is a lab machine and has zero activity on vms but still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE) flags 0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that. https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph storage in my lab. Yesterday, because of a power outage all my nodes went down. After reboot of all nodes ceph seems to show good health and no error (in ceph -s).
When I started using the existing VM I noticed the following errors. Seems like data loss. This is a lab machine and has zero activity on vms but still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE) flags 0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that. https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
On Thu, Feb 16, 2023 at 10:44 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
I usually look for the lock doing something like:
rbd lock ls vms/37d52c81-e78d-4237-b357-db62b820db04_disk
Then remove it doing something like: rbd lock rm vms/37d52c81-e78d-4237-b357-db62b820db04_disk 'auto 94276942759680' client.56157074
If you have a very large number of VMs, you can gather a list of VM uuids with the Openstack client, and then do some awk or similar voodoo to gather the info from Ceph and nuke the locks. After that you should be able to boot the instances normally.
Maybe there's some more graceful way that's blessed by Ceph to do this, but this has worked for me.
-Erik
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph storage
in
my lab. Yesterday, because of a power outage all my nodes went down.
After
reboot of all nodes ceph seems to show good health and no error (in ceph -s).
When I started using the existing VM I noticed the following errors.
Seems
like data loss. This is a lab machine and has zero activity on vms but still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE)
flags
0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph storage
in
my lab. Yesterday, because of a power outage all my nodes went down.
After
reboot of all nodes ceph seems to show good health and no error (in ceph -s).
When I started using the existing VM I noticed the following errors.
Seems
like data loss. This is a lab machine and has zero activity on vms but still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE)
flags
0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
Do you think this is my issue? https://bugs.launchpad.net/ceph/+bug/1968369
On Thu, Feb 16, 2023 at 11:05 PM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph
storage in
my lab. Yesterday, because of a power outage all my nodes went down.
After
reboot of all nodes ceph seems to show good health and no error (in
ceph
-s).
When I started using the existing VM I noticed the following errors.
Seems
like data loss. This is a lab machine and has zero activity on vms but still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE)
flags
0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph storage
in
my lab. Yesterday, because of a power outage all my nodes went down.
After
reboot of all nodes ceph seems to show good health and no error (in ceph -s).
When I started using the existing VM I noticed the following errors.
Seems
like data loss. This is a lab machine and has zero activity on vms but still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE)
flags
0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
Hi Eugen,
I am playing with less important machine and i did following
I shutdown VM but still down following lock
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
root@ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316 blocklisting 192.168.3.12:0/2259335316 until 2023-02-17T16:00:59.399775+0000 (3600 sec)
Still I can see it in the following lock list. Am I missing something?
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
On Fri, Feb 17, 2023 at 2:39 AM Eugen Block eblock@nde.ag wrote:
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph
storage
in
my lab. Yesterday, because of a power outage all my nodes went down.
After
reboot of all nodes ceph seems to show good health and no error (in
ceph
-s).
When I started using the existing VM I noticed the following errors.
Seems
like data loss. This is a lab machine and has zero activity on vms
but
still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE)
flags
0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling
wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
Can you try to blacklist „client.1211875“ or is the result the same? Depending on the distro you could try to restart nova-compute service, but this could lead to shutdown of all VMs in that hypervisor. You could try to remove the lock manually with rbd commands as well.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
I am playing with less important machine and i did following
I shutdown VM but still down following lock
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
root@ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316 blocklisting 192.168.3.12:0/2259335316 until 2023-02-17T16:00:59.399775+0000 (3600 sec)
Still I can see it in the following lock list. Am I missing something?
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
On Fri, Feb 17, 2023 at 2:39 AM Eugen Block eblock@nde.ag wrote:
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph
storage
in
my lab. Yesterday, because of a power outage all my nodes went down.
After
reboot of all nodes ceph seems to show good health and no error (in
ceph
-s).
When I started using the existing VM I noticed the following errors.
Seems
like data loss. This is a lab machine and has zero activity on vms
but
still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE)
flags
0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling
wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
Hi Eugen,
You saved my life!!!!!! all my vms up without any filesystem error :)
This is the correct command to remove the lock.
$ rbd lock rm -p vms ec6044e6-2231-4906-9e30-1e2e72573e64_disk "auto 139643345791728" client.1211875
On Fri, Feb 17, 2023 at 10:06 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
I am playing with less important machine and i did following
I shutdown VM but still down following lock
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
root@ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316 blocklisting 192.168.3.12:0/2259335316 until 2023-02-17T16:00:59.399775+0000 (3600 sec)
Still I can see it in the following lock list. Am I missing something?
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
On Fri, Feb 17, 2023 at 2:39 AM Eugen Block eblock@nde.ag wrote:
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph
storage
in
my lab. Yesterday, because of a power outage all my nodes went down.
After
reboot of all nodes ceph seems to show good health and no error (in
ceph
-s).
When I started using the existing VM I noticed the following errors.
Seems
like data loss. This is a lab machine and has zero activity on vms
but
still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1:
(WRITE)
flags
0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling
wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
Hi Eugen,
I have a few questions before we close this thread.
- Is it normal that ceph locks images during power failure or disaster? - Shouldn't ceph should release locks automatically when VMs shutdown? - Is this a bug or natural behavior of ceph? I am worried what if i have 100s of VMs and remove lock of all of them
On Fri, Feb 17, 2023 at 10:28 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
You saved my life!!!!!! all my vms up without any filesystem error :)
This is the correct command to remove the lock.
$ rbd lock rm -p vms ec6044e6-2231-4906-9e30-1e2e72573e64_disk "auto 139643345791728" client.1211875
On Fri, Feb 17, 2023 at 10:06 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
I am playing with less important machine and i did following
I shutdown VM but still down following lock
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
root@ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316 blocklisting 192.168.3.12:0/2259335316 until 2023-02-17T16:00:59.399775+0000 (3600 sec)
Still I can see it in the following lock list. Am I missing something?
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
On Fri, Feb 17, 2023 at 2:39 AM Eugen Block eblock@nde.ag wrote:
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote: > Folks, > > I am running a small 3 node compute/controller with 3 node ceph
storage
in
> my lab. Yesterday, because of a power outage all my nodes went
down.
After
> reboot of all nodes ceph seems to show good health and no error
(in ceph
> -s). > > When I started using the existing VM I noticed the following
errors.
Seems
> like data loss. This is a lab machine and has zero activity on vms
but
> still loses data and the file system corrupt. Is this normal ? if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both. > > I am not using eraser coding, does that help in this matter? > > blk_update_request: I/O error, dev sda, sector 233000 op 0x1:
(WRITE)
flags
> 0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling
wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
Hello,
I honestly dont know if it's a bug or a security. I faced several power outages now and Ceph won't remove any lock even after few days.
Maybe because Ceph MONs were down too?
Anyway, your thread motivate me to end my unlock script. It still needs improvement but it does the job (used many times now!).
https://github.com/RomainLyon1/cephunlock
Best Regards, Romain
On Fri, 2023-02-17 at 10:45 -0500, Satish Patel wrote:
Hi Eugen,
I have a few questions before we close this thread.
- Is it normal that ceph locks images during power failure or
disaster?
- Shouldn't ceph should release locks automatically when VMs
shutdown?
- Is this a bug or natural behavior of ceph? I am worried what if i
have 100s of VMs and remove lock of all of them
On Fri, Feb 17, 2023 at 10:28 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
You saved my life!!!!!! all my vms up without any filesystem error :)
This is the correct command to remove the lock.
$ rbd lock rm -p vms ec6044e6-2231-4906-9e30-1e2e72573e64_disk "auto 139643345791728" client.1211875
On Fri, Feb 17, 2023 at 10:06 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
I am playing with less important machine and i did following
I shutdown VM but still down following lock
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30- 1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
root@ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316 blocklisting 192.168.3.12:0/2259335316 until 2023-02- 17T16:00:59.399775+0000 (3600 sec)
Still I can see it in the following lock list. Am I missing something?
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30- 1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
On Fri, Feb 17, 2023 at 2:39 AM Eugen Block eblock@nde.ag wrote:
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968
192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag
wrote:
In addition to Sean's response, this has been asked multiple
times,
e.g. [1]. You could check if your hypervisors gave up the
lock on the
RBDs or if they are still locked (rbd status
<pool>/<image>), in that
case you might need to blacklist the clients and see if that
resolves
anything. Do you have regular snapshots (or backups) to be
able to
rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
> On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote: >> Folks, >> >> I am running a small 3 node compute/controller with 3
node ceph storage
in >> my lab. Yesterday, because of a power outage all my nodes
went down.
After >> reboot of all nodes ceph seems to show good health and no
error (in ceph
>> -s). >> >> When I started using the existing VM I noticed the
following errors.
Seems >> like data loss. This is a lab machine and has zero
activity on vms but
>> still loses data and the file system corrupt. Is this
normal ?
> if the vm/cluster hard crashes due to the power cut yes it
can.
> personally i have hit this more often with XFS then ext4
but i have
> seen it with both. >> >> I am not using eraser coding, does that help in this
matter?
>> >> blk_update_request: I/O error, dev sda, sector 233000 op
0x1: (WRITE)
flags >> 0x800 phys_seg 8 prio class 0 > > you will proably need to rescue the isntance and repair
the
> filesystem of each vm with fsck > or similar. so boot with recue image -> repair filestem ->
unrescue
> -> hardreboot/start vm if needed > > you might be able to mitigate this somewhat by disableing
disk
> cacheing at teh qemu level but > that will reduce performance. ceph recommenes that you use > virtio-scis fo the device model and > writeback cach mode. we generally recommend that too
however you can
> use the disk_cachemodes option to > chage that. >
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
> > [libvirt] > disk_cachemodes=file=none,block=none,network=none > > this curreption may also have happend on the cecph cluter
side.
> they have some options that can help prevent that via
journaling wirtes
> > if you can afford it i would get even a small UPS to allow
a
> graceful shutdown if you have future powercuts > to aovid dataloss issues.
Well, it’s the other way around: the compute nodes are the ones acquiring the locks as clients. If ceph goes down they can’t do anything with the locks until the cluster is reachable again, and sometimes a service restart is required, or a manual intervention as in this case. These things happen, the only thing that would help would probably be a stretched (or geo-redundant) ceph cluster to avoid a total failure so the cloud keeps working if one site goes down. Do you see the same impact on that many VMs or only on some of them? Or what does the last question refer to?
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
I have a few questions before we close this thread.
- Is it normal that ceph locks images during power failure or disaster?
- Shouldn't ceph should release locks automatically when VMs shutdown?
- Is this a bug or natural behavior of ceph? I am worried what if i have
100s of VMs and remove lock of all of them
On Fri, Feb 17, 2023 at 10:28 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
You saved my life!!!!!! all my vms up without any filesystem error :)
This is the correct command to remove the lock.
$ rbd lock rm -p vms ec6044e6-2231-4906-9e30-1e2e72573e64_disk "auto 139643345791728" client.1211875
On Fri, Feb 17, 2023 at 10:06 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
I am playing with less important machine and i did following
I shutdown VM but still down following lock
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
root@ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316 blocklisting 192.168.3.12:0/2259335316 until 2023-02-17T16:00:59.399775+0000 (3600 sec)
Still I can see it in the following lock list. Am I missing something?
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
On Fri, Feb 17, 2023 at 2:39 AM Eugen Block eblock@nde.ag wrote:
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
> On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote: >> Folks, >> >> I am running a small 3 node compute/controller with 3 node ceph
storage
in >> my lab. Yesterday, because of a power outage all my nodes went
down.
After >> reboot of all nodes ceph seems to show good health and no error
(in ceph
>> -s). >> >> When I started using the existing VM I noticed the following
errors.
Seems >> like data loss. This is a lab machine and has zero activity on vms
but
>> still loses data and the file system corrupt. Is this normal ? > if the vm/cluster hard crashes due to the power cut yes it can. > personally i have hit this more often with XFS then ext4 but i have > seen it with both. >> >> I am not using eraser coding, does that help in this matter? >> >> blk_update_request: I/O error, dev sda, sector 233000 op 0x1:
(WRITE)
flags >> 0x800 phys_seg 8 prio class 0 > > you will proably need to rescue the isntance and repair the > filesystem of each vm with fsck > or similar. so boot with recue image -> repair filestem -> unrescue > -> hardreboot/start vm if needed > > you might be able to mitigate this somewhat by disableing disk > cacheing at teh qemu level but > that will reduce performance. ceph recommenes that you use > virtio-scis fo the device model and > writeback cach mode. we generally recommend that too however you can > use the disk_cachemodes option to > chage that. >
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
> > [libvirt] > disk_cachemodes=file=none,block=none,network=none > > this curreption may also have happend on the cecph cluter side. > they have some options that can help prevent that via journaling
wirtes
> > if you can afford it i would get even a small UPS to allow a > graceful shutdown if you have future powercuts > to aovid dataloss issues.
This is great! I will give it a try with your script. Thanks!!!
On Fri, Feb 17, 2023 at 10:55 AM CHANU ROMAIN romain.chanu@univ-lyon1.fr wrote:
Hello,
I honestly dont know if it's a bug or a security. I faced several power outages now and Ceph won't remove any lock even after few days.
Maybe because Ceph MONs were down too?
Anyway, your thread motivate me to end my unlock script. It still needs improvement but it does the job (used many times now!).
https://github.com/RomainLyon1/cephunlock
Best Regards, Romain
On Fri, 2023-02-17 at 10:45 -0500, Satish Patel wrote:
Hi Eugen,
I have a few questions before we close this thread.
- Is it normal that ceph locks images during power failure or disaster?
- Shouldn't ceph should release locks automatically when VMs shutdown?
- Is this a bug or natural behavior of ceph? I am worried what if i have
100s of VMs and remove lock of all of them
On Fri, Feb 17, 2023 at 10:28 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
You saved my life!!!!!! all my vms up without any filesystem error :)
This is the correct command to remove the lock.
$ rbd lock rm -p vms ec6044e6-2231-4906-9e30-1e2e72573e64_disk "auto 139643345791728" client.1211875
On Fri, Feb 17, 2023 at 10:06 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
I am playing with less important machine and i did following
I shutdown VM but still down following lock
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
root@ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316 blocklisting 192.168.3.12:0/2259335316 until 2023-02-17T16:00:59.399775+0000 (3600 sec)
Still I can see it in the following lock list. Am I missing something?
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
On Fri, Feb 17, 2023 at 2:39 AM Eugen Block eblock@nde.ag wrote:
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
In addition to Sean's response, this has been asked multiple times, e.g. [1]. You could check if your hypervisors gave up the lock on the RBDs or if they are still locked (rbd status <pool>/<image>), in that case you might need to blacklist the clients and see if that resolves anything. Do you have regular snapshots (or backups) to be able to rollback in case of a curruption?
[1] https://www.spinics.net/lists/ceph-users/msg45937.html
Zitat von Sean Mooney smooney@redhat.com:
On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
Folks,
I am running a small 3 node compute/controller with 3 node ceph
storage
in
my lab. Yesterday, because of a power outage all my nodes went down.
After
reboot of all nodes ceph seems to show good health and no error (in
ceph
-s).
When I started using the existing VM I noticed the following errors.
Seems
like data loss. This is a lab machine and has zero activity on vms
but
still loses data and the file system corrupt. Is this normal ?
if the vm/cluster hard crashes due to the power cut yes it can. personally i have hit this more often with XFS then ext4 but i have seen it with both.
I am not using eraser coding, does that help in this matter?
blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE)
flags
0x800 phys_seg 8 prio class 0
you will proably need to rescue the isntance and repair the filesystem of each vm with fsck or similar. so boot with recue image -> repair filestem -> unrescue -> hardreboot/start vm if needed
you might be able to mitigate this somewhat by disableing disk cacheing at teh qemu level but that will reduce performance. ceph recommenes that you use virtio-scis fo the device model and writeback cach mode. we generally recommend that too however you can use the disk_cachemodes option to chage that.
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
[libvirt] disk_cachemodes=file=none,block=none,network=none
this curreption may also have happend on the cecph cluter side. they have some options that can help prevent that via journaling
wirtes
if you can afford it i would get even a small UPS to allow a graceful shutdown if you have future powercuts to aovid dataloss issues.
I have noticed that every single VM is impacted.
On Fri, Feb 17, 2023 at 3:14 PM Eugen Block eblock@nde.ag wrote:
Well, it’s the other way around: the compute nodes are the ones acquiring the locks as clients. If ceph goes down they can’t do anything with the locks until the cluster is reachable again, and sometimes a service restart is required, or a manual intervention as in this case. These things happen, the only thing that would help would probably be a stretched (or geo-redundant) ceph cluster to avoid a total failure so the cloud keeps working if one site goes down. Do you see the same impact on that many VMs or only on some of them? Or what does the last question refer to?
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
I have a few questions before we close this thread.
- Is it normal that ceph locks images during power failure or disaster?
- Shouldn't ceph should release locks automatically when VMs shutdown?
- Is this a bug or natural behavior of ceph? I am worried what if i have
100s of VMs and remove lock of all of them
On Fri, Feb 17, 2023 at 10:28 AM Satish Patel satish.txt@gmail.com
wrote:
Hi Eugen,
You saved my life!!!!!! all my vms up without any filesystem error :)
This is the correct command to remove the lock.
$ rbd lock rm -p vms ec6044e6-2231-4906-9e30-1e2e72573e64_disk "auto 139643345791728" client.1211875
On Fri, Feb 17, 2023 at 10:06 AM Satish Patel satish.txt@gmail.com wrote:
Hi Eugen,
I am playing with less important machine and i did following
I shutdown VM but still down following lock
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
root@ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316 blocklisting 192.168.3.12:0/2259335316 until 2023-02-17T16:00:59.399775+0000 (3600 sec)
Still I can see it in the following lock list. Am I missing something?
root@ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.1211875 auto 139643345791728 192.168.3.12:0/2259335316
On Fri, Feb 17, 2023 at 2:39 AM Eugen Block eblock@nde.ag wrote:
The lock is aquired automatically, you don't need to create one. I'm curious why you have that many blacklist entries, maybe that is indeed the issue here (locks are not removed). I would shutdown the corrupted VM and see if the compute node still has a lock on that image, because after shutdown it should remove the lock (automatically). If there's still a watcher or lock on that image after shutdown (rbd status vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to blacklist the client with:
# ceph osd blacklist add client.<ID>
Then check the status again, if no watchers are present, boot the VM.
Zitat von Satish Patel satish.txt@gmail.com:
Hi Eugen,
This is what I did, let me know if I missed anything.
root@ceph1:~# ceph osd blacklist ls 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000 listed 13 entries
root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms There is 1 exclusive lock on this image. Locker ID Address client.268212 auto 139971105131968 192.168.3.12:0/1649312807
root@ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807 192.168.3.12:0/1649312807 isn't blocklisted
How do I create a lock?
On Thu, Feb 16, 2023 at 10:45 AM Eugen Block eblock@nde.ag wrote:
> In addition to Sean's response, this has been asked multiple times, > e.g. [1]. You could check if your hypervisors gave up the lock on
the
> RBDs or if they are still locked (rbd status <pool>/<image>), in
that
> case you might need to blacklist the clients and see if that
resolves
> anything. Do you have regular snapshots (or backups) to be able to > rollback in case of a curruption? > > [1] https://www.spinics.net/lists/ceph-users/msg45937.html > > > Zitat von Sean Mooney smooney@redhat.com: > > > On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote: > >> Folks, > >> > >> I am running a small 3 node compute/controller with 3 node ceph
storage
> in > >> my lab. Yesterday, because of a power outage all my nodes went
down.
> After > >> reboot of all nodes ceph seems to show good health and no error
(in ceph
> >> -s). > >> > >> When I started using the existing VM I noticed the following
errors.
> Seems > >> like data loss. This is a lab machine and has zero activity on
vms
but
> >> still loses data and the file system corrupt. Is this normal ? > > if the vm/cluster hard crashes due to the power cut yes it can. > > personally i have hit this more often with XFS then ext4 but i
have
> > seen it with both. > >> > >> I am not using eraser coding, does that help in this matter? > >> > >> blk_update_request: I/O error, dev sda, sector 233000 op 0x1:
(WRITE)
> flags > >> 0x800 phys_seg 8 prio class 0 > > > > you will proably need to rescue the isntance and repair the > > filesystem of each vm with fsck > > or similar. so boot with recue image -> repair filestem ->
unrescue
> > -> hardreboot/start vm if needed > > > > you might be able to mitigate this somewhat by disableing disk > > cacheing at teh qemu level but > > that will reduce performance. ceph recommenes that you use > > virtio-scis fo the device model and > > writeback cach mode. we generally recommend that too however you
can
> > use the disk_cachemodes option to > > chage that. > > >
https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.dis...
> > > > [libvirt] > > disk_cachemodes=file=none,block=none,network=none > > > > this curreption may also have happend on the cecph cluter side. > > they have some options that can help prevent that via journaling
wirtes
> > > > if you can afford it i would get even a small UPS to allow a > > graceful shutdown if you have future powercuts > > to aovid dataloss issues. > > > > >
participants (5)
-
CHANU ROMAIN
-
Erik McCormick
-
Eugen Block
-
Satish Patel
-
Sean Mooney