[openstack-dev] [swift] what does swift do if the auditor find that all 3 replicas are corrupt?
abblly.daniel at gmail.com
Sat Nov 9 11:45:53 UTC 2013
Got it, thanks very much.
On Fri, Nov 8, 2013 at 2:32 AM, Samuel Merritt <sam at swiftstack.com> wrote:
> On 11/7/13 5:59 AM, Daniel Li wrote:
>> Thanks very much for your help, and please see my inline
>> On Thu, Nov 7, 2013 at 2:30 AM, Samuel Merritt <sam at swiftstack.com
>> <mailto:sam at swiftstack.com>> wrote:
>> On 11/6/13 7:12 AM, Daniel Li wrote:
>> I have a question about swift: what does swift do if the
>> find that all 3 replicas are corrupt?
>> will it notify the owner of the object(email to the account
>> what will happen if the GET request to the corrupted object?
>> will it return a special error telling that all the replicas are
>> Or will it just say that the object is not exist?
>> Or it just return one of the corrupted replica?
>> Or something else?
>> If all 3 (or N) replicas are corrupt, then the auditors will
>> eventually quarantine all of them, and subsequent GET requests will
>> receive 404 responses.
>> No notifications are sent, nor is it really feasible to start
>> sending them. "The auditor" is not a single process; there is one
>> Swift auditor process running on each node in a cluster. Therefore,
>> when an object is quarantined, there's no way for its auditor to
>> know if the other copies are okay or not.
>> Note that this is highly unlikely to ever happen, at least with the
>> default of 3 replicas. When an auditor finds a corrupt object, it
>> quarantines it (moves it to a "quarantines" directory).
>> Did you mean that when the auditor found the corruption, it did not
>> copy good replica from other object server to overwrite the corrupted
>> one, it just moved it to a quarantines directory?
> That is correct. The object auditors don't perform any network IO, and in
> fact do not use the ring at all. All they do is scan the filesystems and
> quarantine bad objects in an infinite loop.
> (Of course, there are also container and account auditors that do similar
> things, but for container and account databases.)
> Then, since that object is missing, the replication processes will
>> recreate the object by copying it from a node with a good copy.
>> When did the replication processes recreated the object by copying it
>> from a node with a good copy? Does the auditor send a message to
>> replication so the replication will do the copy immediately? And what is
>> a 'good' copy? Does the good copy's MD5 value is checked before copying?
> It'll happen whenever the other replicators, which are running on other
> nodes, get around to it.
> Replication in Swift is push-based, not pull-based; there is no receiver
> here to which a message could be sent.
> Currently, a "good" copy is one that hasn't been quarantined. Since
> replication uses rsync to push files around the network, there's no
> checking of MD5 at copy time. However, there is work underway to develop a
> replication protocol that avoids rsync entirely and uses the object server
> throughout the entire replication process, and that would give the object
> server a chance to check MD5 checksums on incoming writes.
> Note that this is only important should 2 replicas experience
> near-simultaneous bitrot; in that case, there is a chance that bad-copy A
> will get quarantined and replaced with bad-copy B. Eventually, though, a
> bad copy will get quarantined and replaced with a good copy, and then
> you've got 2 good copies and 1 bad one, which reduces to a
> previously-discussed scenario.
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev