[openstack-dev] [Openstack] [Swift] Erasure coding reconstructor doesn't work
clay.gerrard at gmail.com
Wed Jul 22 20:19:26 UTC 2015
On Wed, Jul 22, 2015 at 12:24 PM, Changbin Liu <changbin.liu at gmail.com>
> But now I wonder: is it "by design" that EC does not handle an accidental
> deletion of just the data file?
Well, the design goal was not "do not handle the accidental deletion of
just the data file" - it was "make replication fast enough that it works" -
and that required not listing all the dirs all the time.
> Deleting both data file and hashes.pkl file is more like a
> deliberately-created failure case instead of a normal one.
To me deleting some file that swift wrote to disk without updating (or
removing) the index it normally updates during write/delete/replicate to
accelerate replication seems like a deliberately created failure case? You
could try to flip a bit or truncate a data file and let the auditor pick it
up. Or rm a suffix and wait for the every-so-often suffixdir listdir to
catch it, or remove an entire partition, or wipe a new filesystem onto the
disk. Or shutdown a node and do a PUT, then shutdown the handoff node, and
run the reconstructor. Any of the "normal" failure conditions like that
(and plenty more!) are all detected by and handled efficiently.
To me Swift EC repairing seems different from the triple-replication mode,
> where you delete any data file copy, it will be restored.
Well, replication and reconstruction are different in lots of ways - but
not this part. If you rm a .data file without updating the index you'll
need some activity (post/copy/put/quarantine) in the suffix before the
replication engine can notice.
Luckily (?) people don't often go under the covers into the middle of the
storage system and rm data like that?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev