[Openstack] [Swift] Drive failure detection and recovery using swift-drive-audit

Clay Gerrard clay.gerrard at gmail.com
Fri Dec 5 21:02:24 UTC 2014


On Fri, Dec 5, 2014 at 11:47 AM, Shrinand Javadekar <shrinand at maginatics.com
> wrote:

>
> If it is less than N, the swift-drive-audit tool could potentially
> unmount an already recovered drive.
>
> If it is > N, it is possible to miss some messages in the log file.
>
> Is the above analysis correct?
>

You're probably not too far off, but maybe in practice the frequency and
depth off the lookback is still lower than the minimum amount of time a dc
tech can physically walk out to a server and swap out a failing disk that
gets unmounted?

Once it's unmounted it stops generating errors, so maybe it's safer to pick
a frequency that's generally lower then the cycle time on a drive swap and
worst case you risk a replaced drive getting unmounted again for old errors
when the dc techs are super on the ball for some reason.  At least an extra
unmount can be fixed remotely.

-Clay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20141205/13947444/attachment.html>


More information about the Openstack mailing list