Read Only FS after ceph issue

Grant Morley grant at absolutedevops.io
Mon Jan 21 23:18:16 UTC 2019


Hi,

Thanks for the email. We have managed to fix this by upgrading to the 
latest minor version patch of Ceph Jewel and restarting the OSDs. Seems 
like there might have been some write lock issues that were not being 
reported by ceph.

Many Thanks,

On 21/01/2019 20:49, melanie witt wrote:
> On Mon, 21 Jan 2019 17:40:52 +0000, Grant Morley 
> <grant at absolutedevops.io> wrote:
>> Hi all,
>>
>> We are in the process of retiring one of our old platforms and last 
>> night our ceph cluster went into an "Error" state briefly because 1 
>> of the OSDs went close to full. The data got re-balanced fine and the 
>> health of ceph is now "OK" - however we have about 40% of our 
>> instances that now have corrupt disks which is a bit odd.
>>
>> Even more strange is that we cannot get them into rescue mode. As 
>> soon as we try we the instances seem to hang during the bootup 
>> process when they are trying to mount "/dev/vdb1" and we eventually 
>> get a kernel timeout error as below:
>>
>> Warning: fsck not present, so skipping root file system
>> [    5.644526] EXT4-fs (vdb1): INFO: recovery required on readonly 
>> filesystem
>> [    5.645583] EXT4-fs (vdb1): write access will be enabled during 
>> recovery
>> [  240.504873] INFO: task exe:332 blocked for more than 120 seconds.
>> [  240.506986]       Not tainted 4.4.0-66-generic #87-Ubuntu
>> [  240.508782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> disables this message.
>> [  240.511438] exe             D ffff88003714b878     0 332      1 
>> 0x00000000
>> [  240.513809]  ffff88003714b878 ffff88007c18e358 ffffffff81e11500 
>> ffff88007be81c00
>> [  240.516665]  ffff88003714c000 ffff88007fc16dc0 7fffffffffffffff 
>> ffffffff81838cd0
>> [  240.519546]  ffff88003714b9d0 ffff88003714b890 ffffffff818384d5 
>> 0000000000000000
>> [  240.522399] Call Trace:
>>
>> I have even tried using a different image for nova rescue and we are 
>> getting the same results. Has anyone come across this before?
>>
>> This system is running OpenStack Mitaka with Ceph Jewel.
>>
>> Any help or suggestions will be much appreciated.
>
> I don't know whether this is related, but what you describe reminded 
> me of issues I have seen before in the past:
>
> https://bugs.launchpad.net/nova/+bug/1781878
>
> See my comment #1 on the bug ^ for links to additional information on 
> the same root issue.
>
> Hope this helps in some way,
> -melanie
-- 
Grant Morley
Cloud Lead
Absolute DevOps Ltd
Units H, J & K, Gateway 1000, Whittle Way, Stevenage, Herts, SG1 2FP
www.absolutedevops.io <http://www.absolutedevops.io/> 
grant at absolutedevops.io <mailto:grant at absolutedevops.io> 0845 874 0580
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190121/7d5c87e2/attachment.html>


More information about the openstack-discuss mailing list