[Openstack] [Swift] Drive failure detection and recovery using swift-drive-audit

Shrinand Javadekar shrinand at maginatics.com
Mon Dec 8 18:36:00 UTC 2014


Thanks Clay!

On Fri, Dec 5, 2014 at 1:02 PM, Clay Gerrard <clay.gerrard at gmail.com> wrote:
> On Fri, Dec 5, 2014 at 11:47 AM, Shrinand Javadekar
> <shrinand at maginatics.com> wrote:
>>
>>
>> If it is less than N, the swift-drive-audit tool could potentially
>> unmount an already recovered drive.
>>
>> If it is > N, it is possible to miss some messages in the log file.
>>
>> Is the above analysis correct?
>
>
> You're probably not too far off, but maybe in practice the frequency and
> depth off the lookback is still lower than the minimum amount of time a dc
> tech can physically walk out to a server and swap out a failing disk that
> gets unmounted?
>
> Once it's unmounted it stops generating errors, so maybe it's safer to pick
> a frequency that's generally lower then the cycle time on a drive swap and
> worst case you risk a replaced drive getting unmounted again for old errors
> when the dc techs are super on the ball for some reason.  At least an extra
> unmount can be fixed remotely.
>
> -Clay




More information about the Openstack mailing list