Ceph outage cause filesystem error on VM

Satish Patel satish.txt at gmail.com
Fri Feb 17 21:47:16 UTC 2023


I have noticed that every single VM is impacted.

On Fri, Feb 17, 2023 at 3:14 PM Eugen Block <eblock at nde.ag> wrote:

> Well, it’s the other way around: the compute nodes are the ones
> acquiring the locks as clients. If ceph goes down they can’t do
> anything with the locks until the cluster is reachable again, and
> sometimes a service restart is required, or a manual intervention as
> in this case. These things happen, the only thing that would help
> would probably be a stretched (or geo-redundant) ceph cluster to avoid
> a total failure so the cloud keeps working if one site goes down.
> Do you see the same impact on that many VMs or only on some of them?
> Or what does the last question refer to?
>
> Zitat von Satish Patel <satish.txt at gmail.com>:
>
> > Hi Eugen,
> >
> > I have a few questions before we close this thread.
> >
> > - Is it normal that ceph locks images during power failure or disaster?
> > - Shouldn't ceph should release locks automatically when VMs shutdown?
> > - Is this a bug or natural behavior of ceph? I am worried what if i have
> > 100s of VMs and remove lock of all of them
> >
> >
> >
> > On Fri, Feb 17, 2023 at 10:28 AM Satish Patel <satish.txt at gmail.com>
> wrote:
> >
> >> Hi Eugen,
> >>
> >> You saved my life!!!!!! all my vms up without any filesystem error :)
> >>
> >> This is the correct command to remove the lock.
> >>
> >> $ rbd lock rm -p vms ec6044e6-2231-4906-9e30-1e2e72573e64_disk "auto
> >> 139643345791728" client.1211875
> >>
> >>
> >>
> >> On Fri, Feb 17, 2023 at 10:06 AM Satish Patel <satish.txt at gmail.com>
> >> wrote:
> >>
> >>> Hi Eugen,
> >>>
> >>> I am playing with less important machine and i did following
> >>>
> >>> I shutdown VM but still down following lock
> >>>
> >>> root at ceph1:~# rbd lock list --image
> >>> ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms
> >>> There is 1 exclusive lock on this image.
> >>> Locker          ID                    Address
> >>> client.1211875  auto 139643345791728  192.168.3.12:0/2259335316
> >>>
> >>> root at ceph1:~# ceph osd blacklist add 192.168.3.12:0/2259335316
> >>> blocklisting 192.168.3.12:0/2259335316 until
> >>> 2023-02-17T16:00:59.399775+0000 (3600 sec)
> >>>
> >>> Still I can see it in the following lock list. Am I missing something?
> >>>
> >>> root at ceph1:~# rbd lock list --image
> >>> ec6044e6-2231-4906-9e30-1e2e72573e64_disk -p vms
> >>> There is 1 exclusive lock on this image.
> >>> Locker          ID                    Address
> >>> client.1211875  auto 139643345791728  192.168.3.12:0/2259335316
> >>>
> >>>
> >>>
> >>> On Fri, Feb 17, 2023 at 2:39 AM Eugen Block <eblock at nde.ag> wrote:
> >>>
> >>>> The lock is aquired automatically, you don't need to create one. I'm
> >>>> curious why you have that many blacklist entries, maybe that is indeed
> >>>> the issue here (locks are not removed). I would shutdown the corrupted
> >>>> VM and see if the compute node still has a lock on that image, because
> >>>> after shutdown it should remove the lock (automatically). If there's
> >>>> still a watcher or lock on that image after shutdown (rbd status
> >>>> vms/55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk) you can try to
> >>>> blacklist the client with:
> >>>>
> >>>> # ceph osd blacklist add client.<ID>
> >>>>
> >>>> Then check the status again, if no watchers are present, boot the VM.
> >>>>
> >>>>
> >>>> Zitat von Satish Patel <satish.txt at gmail.com>:
> >>>>
> >>>> > Hi Eugen,
> >>>> >
> >>>> > This is what I did, let me know if I missed anything.
> >>>> >
> >>>> > root at ceph1:~# ceph osd blacklist ls
> >>>> > 192.168.3.12:0/0 2023-02-17T04:48:54.381763+0000
> >>>> > 192.168.3.22:0/753370860 2023-02-17T04:47:08.185434+0000
> >>>> > 192.168.3.22:0/2833179066 2023-02-17T04:47:08.185434+0000
> >>>> > 192.168.3.22:0/1812968936 2023-02-17T04:47:08.185434+0000
> >>>> > 192.168.3.22:6824/2057987683 2023-02-17T04:47:08.185434+0000
> >>>> > 192.168.3.21:0/2756666482 2023-02-17T05:16:23.939511+0000
> >>>> > 192.168.3.21:0/1646520197 2023-02-17T05:16:23.939511+0000
> >>>> > 192.168.3.22:6825/2057987683 2023-02-17T04:47:08.185434+0000
> >>>> > 192.168.3.21:0/526748613 2023-02-17T05:16:23.939511+0000
> >>>> > 192.168.3.21:6815/2454821797 2023-02-17T05:16:23.939511+0000
> >>>> > 192.168.3.22:0/288537807 2023-02-17T04:47:08.185434+0000
> >>>> > 192.168.3.21:0/4161448504 2023-02-17T05:16:23.939511+0000
> >>>> > 192.168.3.21:6824/2454821797 2023-02-17T05:16:23.939511+0000
> >>>> > listed 13 entries
> >>>> >
> >>>> > root at ceph1:~# rbd lock list --image
> >>>> > 55dbf40b-0a6a-4bab-b3a5-b4bb74e963af_disk -p vms
> >>>> > There is 1 exclusive lock on this image.
> >>>> > Locker         ID                    Address
> >>>> > client.268212  auto 139971105131968  192.168.3.12:0/1649312807
> >>>> >
> >>>> > root at ceph1:~# ceph osd blacklist rm 192.168.3.12:0/1649312807
> >>>> > 192.168.3.12:0/1649312807 isn't blocklisted
> >>>> >
> >>>> > How do I create a lock?
> >>>> >
> >>>> >
> >>>> > On Thu, Feb 16, 2023 at 10:45 AM Eugen Block <eblock at nde.ag> wrote:
> >>>> >
> >>>> >> In addition to Sean's response, this has been asked multiple times,
> >>>> >> e.g. [1]. You could check if your hypervisors gave up the lock on
> the
> >>>> >> RBDs or if they are still locked (rbd status <pool>/<image>), in
> that
> >>>> >> case you might need to blacklist the clients and see if that
> resolves
> >>>> >> anything. Do you have regular snapshots (or backups) to be able to
> >>>> >> rollback in case of a curruption?
> >>>> >>
> >>>> >> [1] https://www.spinics.net/lists/ceph-users/msg45937.html
> >>>> >>
> >>>> >>
> >>>> >> Zitat von Sean Mooney <smooney at redhat.com>:
> >>>> >>
> >>>> >> > On Thu, 2023-02-16 at 09:56 -0500, Satish Patel wrote:
> >>>> >> >> Folks,
> >>>> >> >>
> >>>> >> >> I am running a small 3 node compute/controller with 3 node ceph
> >>>> storage
> >>>> >> in
> >>>> >> >> my lab. Yesterday, because of a power outage all my nodes went
> >>>> down.
> >>>> >> After
> >>>> >> >> reboot of all nodes ceph seems to show good health and no error
> >>>> (in ceph
> >>>> >> >> -s).
> >>>> >> >>
> >>>> >> >> When I started using the existing VM I noticed the following
> >>>> errors.
> >>>> >> Seems
> >>>> >> >> like data loss. This is a lab machine and has zero activity on
> vms
> >>>> but
> >>>> >> >> still loses data and the file system corrupt. Is this normal ?
> >>>> >> > if the vm/cluster hard crashes due to the power cut yes it can.
> >>>> >> > personally i have hit this more often with XFS then ext4 but i
> have
> >>>> >> > seen it with both.
> >>>> >> >>
> >>>> >> >> I am not using eraser coding, does that help in this matter?
> >>>> >> >>
> >>>> >> >> blk_update_request: I/O error, dev sda, sector 233000 op 0x1:
> >>>> (WRITE)
> >>>> >> flags
> >>>> >> >> 0x800 phys_seg 8 prio class 0
> >>>> >> >
> >>>> >> > you will proably need to rescue the isntance and repair the
> >>>> >> > filesystem of each vm with fsck
> >>>> >> > or similar. so boot with recue image -> repair filestem ->
> unrescue
> >>>> >> > -> hardreboot/start vm if needed
> >>>> >> >
> >>>> >> > you might be able to mitigate this somewhat by disableing disk
> >>>> >> > cacheing at teh qemu level but
> >>>> >> > that will reduce performance. ceph recommenes that you use
> >>>> >> > virtio-scis fo the device model and
> >>>> >> > writeback cach mode. we generally recommend that too however you
> can
> >>>> >> > use the disk_cachemodes option to
> >>>> >> > chage that.
> >>>> >> >
> >>>> >>
> >>>>
> https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.disk_cachemodes
> >>>> >> >
> >>>> >> > [libvirt]
> >>>> >> > disk_cachemodes=file=none,block=none,network=none
> >>>> >> >
> >>>> >> > this curreption may also have happend on the cecph cluter side.
> >>>> >> > they have some options that can help prevent that via journaling
> >>>> wirtes
> >>>> >> >
> >>>> >> > if you can afford it i would get even a small UPS to allow a
> >>>> >> > graceful shutdown if you have future powercuts
> >>>> >> > to aovid dataloss issues.
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>>
> >>>>
> >>>>
> >>>>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230217/cf57d505/attachment.htm>


More information about the openstack-discuss mailing list