Ceph outage cause filesystem error on VM

CHANU ROMAIN romain.chanu at univ-lyon1.fr
Fri Feb 17 15:55:36 UTC 2023


Hello,

I honestly dont know if it's a bug or a security. I faced several power
outages now and Ceph won't remove any lock even after few days. 

Maybe because Ceph MONs were down too?

Anyway, your thread motivate me to end  my unlock script. It still
needs improvement but it does the job (used many times now!).

https://github.com/RomainLyon1/cephunlock

Best Regards,
Romain


On Fri, 2023-02-17 at 10:45 -0500, Satish Patel wrote:
> Hi Eugen,
> 
> I have a few questions before we close this thread. 
> 
> - Is it normal that ceph locks images during power failure or
> disaster?  
> - Shouldn't ceph should release locks automatically when VMs
> shutdown? 
> - Is this a bug or natural behavior of ceph? I am worried what if i
> have 100s of VMs and remove lock of all of them
> 
> 
> 
> On Fri, Feb 17, 2023 at 10:28 AM Satish Patel <satish.txt at gmail.com>
> wrote:
> > Hi Eugen,
> > 
> > You saved my life!!!!!! all my vms up without any filesystem error
> > :) 
> > 
> > This is the correct command to remove the lock. 
> > 
> > $ rbd lock rm -p vms ec6044e6-2231-4906-9e30-1e2e72573e64_disk
> > "auto 139643345791728" client.1211875
> > 
> > 
> > 
> > On Fri, Feb 17, 2023 at 10:06 AM Satish Patel
> > <satish.txt at gmail.com> wrote:
> > > Hi Eugen,
> > > 
> > > I am playing with less important machine and i did following
> > > 
> > > I shutdown VM but still down following lock
> > > 
> > > root at ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-
> > > 1e2e72573e64_disk -p vms
> > > There is 1 exclusive lock on this image.
> > > Locker          ID                    Address
> > > client.1211875  auto 139643345791728  192.168.3.12:0/2259335316
> > > 
> > > root at ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316
> > > blocklisting 192.168.3.12:0/2259335316 until 2023-02-
> > > 17T16:00:59.399775+0000 (3600 sec)
> > > 
> > > Still I can see it in the following lock list. Am I missing
> > > something?
> > > 
> > > root at ceph1:~# rbd lock list --image ec6044e6-2231-4906-9e30-
> > > 1e2e72573e64_disk -p vms
> > > There is 1 exclusive lock on this image.
> > > Locker          ID                    Address
> > > client.1211875  auto 139643345791728  192.168.3.12:0/2259335316
> > > 
> > > 
> > > 
> > > On Fri, Feb 17, 2023 at 2:39 AM Eugen Block <eblock at nde.ag>
> > > wrote:
> > > > The lock is aquired automatically, you don't need to create
> > > > one. I'm  
> > > > curious why you have that many blacklist entries, maybe that is
> > > > indeed  
> > > > the issue here (locks are not removed). I would shutdown the
> > > > corrupted  
> > > > VM and see if the compute node still has a lock on that image,
> > > > because  
> > > > after shutdown it should remove the lock (automatically). If
> > > > there's  
> > > > still a watcher or lock on that image after shutdown (rbd
> > > > status  
> > > > vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to  
> > > > blacklist the client with:
> > > > 
> > > > # ceph osd blacklist add client.<ID>
> > > > 
> > > > Then check the status again, if no watchers are present, boot
> > > > the VM.
> > > > 
> > > > 
> > > > Zitat von Satish Patel <satish.txt at gmail.com>:
> > > > 
> > > > > Hi Eugen,
> > > > >
> > > > > This is what I did, let me know if I missed anything.
> > > > >
> > > > > root at ceph1:~# ceph osd blacklist ls
> > > > > 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000
> > > > > 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000
> > > > > 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000
> > > > > 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000
> > > > > 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000
> > > > > 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000
> > > > > 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000
> > > > > 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000
> > > > > 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000
> > > > > 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000
> > > > > 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000
> > > > > 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000
> > > > > 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000
> > > > > listed 13 entries
> > > > >
> > > > > root at ceph1:~# rbd lock list --image
> > > > > 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms
> > > > > There is 1 exclusive lock on this image.
> > > > > Locker         ID                    Address
> > > > > client.268212  auto 139971105131968 
> > > > 192.168.3.12:0/1649312807
> > > > >
> > > > > root at ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807
> > > > > 192.168.3.12:0/1649312807 isn't blocklisted
> > > > >
> > > > > How do I create a lock?
> > > > >
> > > > >
> > > > > On Thu, Feb 16, 2023 at 10:45 AM Eugen Block <eblock at nde.ag>
> > > > wrote:
> > > > >
> > > > >> In addition to Sean's response, this has been asked multiple
> > > > times,
> > > > >> e.g. [1]. You could check if your hypervisors gave up the
> > > > lock on the
> > > > >> RBDs or if they are still locked (rbd status
> > > > <pool>/<image>), in that
> > > > >> case you might need to blacklist the clients and see if that
> > > > resolves
> > > > >> anything. Do you have regular snapshots (or backups) to be
> > > > able to
> > > > >> rollback in case of a curruption?
> > > > >>
> > > > >> [1] https://www.spinics.net/lists/ceph-users/msg45937.html
> > > > >>
> > > > >>
> > > > >> Zitat von Sean Mooney <smooney at redhat.com>:
> > > > >>
> > > > >> > On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
> > > > >> >> Folks,
> > > > >> >>
> > > > >> >> I am running a small 3 node compute/controller with 3
> > > > node ceph storage
> > > > >> in
> > > > >> >> my lab. Yesterday, because of a power outage all my nodes
> > > > went down.
> > > > >> After
> > > > >> >> reboot of all nodes ceph seems to show good health and no
> > > > error (in ceph
> > > > >> >> -s).
> > > > >> >>
> > > > >> >> When I started using the existing VM I noticed the
> > > > following errors.
> > > > >> Seems
> > > > >> >> like data loss. This is a lab machine and has zero
> > > > activity on vms but
> > > > >> >> still loses data and the file system corrupt. Is this
> > > > normal ?
> > > > >> > if the vm/cluster hard crashes due to the power cut yes it
> > > > can.
> > > > >> > personally i have hit this more often with XFS then ext4
> > > > but i have
> > > > >> > seen it with both.
> > > > >> >>
> > > > >> >> I am not using eraser coding, does that help in this
> > > > matter?
> > > > >> >>
> > > > >> >> blk_update_request: I/O error, dev sda, sector 233000 op
> > > > 0x1: (WRITE)
> > > > >> flags
> > > > >> >> 0x800 phys_seg 8 prio class 0
> > > > >> >
> > > > >> > you will proably need to rescue the isntance and repair
> > > > the
> > > > >> > filesystem of each vm with fsck
> > > > >> > or similar. so boot with recue image -> repair filestem ->
> > > > unrescue
> > > > >> > -> hardreboot/start vm if needed
> > > > >> >
> > > > >> > you might be able to mitigate this somewhat by disableing
> > > > disk
> > > > >> > cacheing at teh qemu level but
> > > > >> > that will reduce performance. ceph recommenes that you use
> > > > >> > virtio-scis fo the device model and
> > > > >> > writeback cach mode. we generally recommend that too
> > > > however you can
> > > > >> > use the disk_cachemodes option to
> > > > >> > chage that.
> > > > >> >
> > > > >>
> > > > https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.disk_cachemodes
> > > > >> >
> > > > >> > [libvirt]
> > > > >> > disk_cachemodes=file=none,block=none,network=none
> > > > >> >
> > > > >> > this curreption may also have happend on the cecph cluter
> > > > side.
> > > > >> > they have some options that can help prevent that via
> > > > journaling wirtes
> > > > >> >
> > > > >> > if you can afford it i would get even a small UPS to allow
> > > > a
> > > > >> > graceful shutdown if you have future powercuts
> > > > >> > to aovid dataloss issues.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > 
> > > > 
> > > > 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230217/63c2de79/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4513 bytes
Desc: not available
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230217/63c2de79/attachment-0001.bin>


More information about the openstack-discuss mailing list