[Openstack-operators] I/O errors on RBD after hypervisor crash.

Evan Bollig PhD boll0107 at umn.edu
Mon Apr 30 17:44:02 UTC 2018


Good tips. Thanks for following up. We'll be on the lookout for this too.

Cheers,
-E
--
Evan F. Bollig, PhD
Senior Scientific Computing Consultant, Application Developer |
Scientific Computing Solutions (SCS)
Minnesota Supercomputing Institute | msi.umn.edu
University of Minnesota | umn.edu
boll0107 at umn.edu | 612-624-1447 | Walter Lib Rm 556


On Mon, Apr 30, 2018 at 12:37 PM, Mike Lowe <jomlowe at iu.edu> wrote:
> Sometimes I’ve had similar problems that can by fixed by running fsck against the rbd device in bare meta oob via rbd-nbd.  I’ve been thinking it’s related to trim/discard and some sort of disk geometry mismatch.
>
>> On Apr 30, 2018, at 1:22 PM, Jonathan Proulx <jon at csail.mit.edu> wrote:
>>
>>
>> In Proulx's Corollary to Murphy's Law, just after hitting send I tried
>> something that "worked".
>>
>> I noticed the volume shared nothing with the image it was based on
>> so tried "flattening" it just to try something.
>>
>> Oddly that worked, that or just having waited in power off state for
>> an hour wile I was at lunch.
>>
>> Still have no theory on why it broke or how that could be a fix...if
>> anyone else does please do tell :)
>>
>> Thanks,
>> -JOn
>>
>> On Mon, Apr 30, 2018 at 12:58:16PM -0400, Jonathan Proulx wrote:
>> :Hi All,
>> :
>> :I have a VM with ephemeral root on RBD spewing I/O erros on boot after
>> :hypervisor crash.  I've (unfortunately) seen a lot of hypervisors go
>> :down badly with lots of VMs on them and this is a new one on me.
>> :
>> :I can 'rbd export' the volume and I get a clean filesystem.
>> :
>> :version details
>> :
>> :OpenStack: Mitaka
>> :Host OS:   Ubuntu 16.04
>> :Ceph:      Luminous (12.2.4)
>> :
>> :after booting to initrd VM shows:
>> :
>> :end_request: I/O error, dev vda, sector <lots of sectors>
>> :
>> :Tried hard reboot, tried rescue (in which case vdb shows same
>> :issue) tried migrating to different hypervisor and all have consistent
>> :failure.
>> :
>> :I do have writeback caching enable on the crashed hypervisor so I can
>> :imaging filesystem corruption, but not this type of I/O error.
>> :
>> :Also if the rbd volume doesn't seem to be dammaged since I could dump
>> :it to an iamge and see correct partioning and filesystems.
>> :
>> :Anyone seen this before? I have the bits since the export worked but
>> :concerned about possibility of recurrence.
>> :
>> :Thanks,
>> :-Jon
>> :
>> :--
>>
>> --
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



More information about the OpenStack-operators mailing list