<div dir="ltr">Hi Eugen,<div><br></div><div>This is what I did, let me know if I missed anything. </div><div><br></div><div>root@ceph1:~# ceph osd blacklist ls<br><a href="http://192.168.3.12:0/0" target="_blank">192.168.3.12:0/0</a> 2023-02-17T04:48:54.381763+0000<br><a href="http://192.168.3.22:0/753370860" target="_blank">192.168.3.22:0/753370860</a> 2023-02-17T04:47:08.185434+0000<br><a href="http://192.168.3.22:0/2833179066" target="_blank">192.168.3.22:0/2833179066</a> 2023-02-17T04:47:08.185434+0000<br><a href="http://192.168.3.22:0/1812968936" target="_blank">192.168.3.22:0/1812968936</a> 2023-02-17T04:47:08.185434+0000<br><a href="http://192.168.3.22:6824/2057987683" target="_blank">192.168.3.22:6824/2057987683</a> 2023-02-17T04:47:08.185434+0000<br><a href="http://192.168.3.21:0/2756666482" target="_blank">192.168.3.21:0/2756666482</a> 2023-02-17T05:16:23.939511+0000<br><a href="http://192.168.3.21:0/1646520197" target="_blank">192.168.3.21:0/1646520197</a> 2023-02-17T05:16:23.939511+0000<br><a href="http://192.168.3.22:6825/2057987683" target="_blank">192.168.3.22:6825/2057987683</a> 2023-02-17T04:47:08.185434+0000<br><a href="http://192.168.3.21:0/526748613" target="_blank">192.168.3.21:0/526748613</a> 2023-02-17T05:16:23.939511+0000<br><a href="http://192.168.3.21:6815/2454821797" target="_blank">192.168.3.21:6815/2454821797</a> 2023-02-17T05:16:23.939511+0000<br><a href="http://192.168.3.22:0/288537807" target="_blank">192.168.3.22:0/288537807</a> 2023-02-17T04:47:08.185434+0000<br><a href="http://192.168.3.21:0/4161448504" target="_blank">192.168.3.21:0/4161448504</a> 2023-02-17T05:16:23.939511+0000<br><a href="http://192.168.3.21:6824/2454821797" target="_blank">192.168.3.21:6824/2454821797</a> 2023-02-17T05:16:23.939511+0000<br>listed 13 entries<br><br></div><div>root@ceph1:~# rbd lock list --image 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms<br>There is 1 exclusive lock on this image.<br>Locker         ID                    Address<br>client.268212  auto 139971105131968  <a href="http://192.168.3.12:0/1649312807" target="_blank">192.168.3.12:0/1649312807</a><br><br></div><div>root@ceph1:~# ceph osd blacklist rm <a href="http://192.168.3.12:0/1649312807" target="_blank">192.168.3.12:0/1649312807</a></div><div><a href="http://192.168.3.12:0/1649312807" target="_blank">192.168.3.12:0/1649312807</a> isn't blocklisted<br></div><div><br></div><div>How do I create a lock? </div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 16, 2023 at 10:45 AM Eugen Block <<a href="mailto:eblock@nde.ag" target="_blank">eblock@nde.ag</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">In addition to Sean's response, this has been asked multiple times,  <br>

e.g. [1]. You could check if your hypervisors gave up the lock on the  <br>

RBDs or if they are still locked (rbd status <pool>/<image>), in that  <br>

case you might need to blacklist the clients and see if that resolves  <br>

anything. Do you have regular snapshots (or backups) to be able to  <br>

rollback in case of a curruption?<br>

<br>

[1] <a href="https://www.spinics.net/lists/ceph-users/msg45937.html" rel="noreferrer" target="_blank">https://www.spinics.net/lists/ceph-users/msg45937.html</a><br>

<br>

<br>

Zitat von Sean Mooney <<a href="mailto:smooney@redhat.com" target="_blank">smooney@redhat.com</a>>:<br>

<br>

> On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:<br>

>> Folks,<br>

>><br>

>> I am running a small 3 node compute/controller with 3 node ceph storage in<br>

>> my lab. Yesterday, because of a power outage all my nodes went down. After<br>

>> reboot of all nodes ceph seems to show good health and no error (in ceph<br>

>> -s).<br>

>><br>

>> When I started using the existing VM I noticed the following errors. Seems<br>

>> like data loss. This is a lab machine and has zero activity on vms but<br>

>> still loses data and the file system corrupt. Is this normal ?<br>

> if the vm/cluster hard crashes due to the power cut yes it can.<br>

> personally i have hit this more often with XFS then ext4 but i have  <br>

> seen it with both.<br>

>><br>

>> I am not using eraser coding, does that help in this matter?<br>

>><br>

>> blk_update_request: I/O error, dev sda, sector 233000 op 0x1: (WRITE) flags<br>

>> 0x800 phys_seg 8 prio class 0<br>

><br>

> you will proably need to rescue the isntance and repair the  <br>

> filesystem of each vm with fsck<br>

> or similar. so boot with recue image -> repair filestem -> unrescue  <br>

> -> hardreboot/start vm if needed<br>

><br>

> you might be able to mitigate this somewhat by disableing disk  <br>

> cacheing at teh qemu level but<br>

> that will reduce performance. ceph recommenes that you use  <br>

> virtio-scis fo the device model and<br>

> writeback cach mode. we generally recommend that too however you can  <br>

> use the disk_cachemodes option to<br>

> chage that.  <br>

> <a href="https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.disk_cachemodes" rel="noreferrer" target="_blank">https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.disk_cachemodes</a><br>

><br>

> [libvirt]<br>

> disk_cachemodes=file=none,block=none,network=none<br>

><br>

> this curreption may also have happend on the cecph cluter side.<br>

> they have some options that can help prevent that via journaling wirtes<br>

><br>

> if you can afford it i would get even a small UPS to allow a  <br>

> graceful shutdown if you have future powercuts<br>

> to aovid dataloss issues.<br>

<br>

<br>

<br>

<br>

</blockquote></div>