[openstack-dev] [Openstack] [Swift] Erasure coding reconstructor doesn't work

Luse, Paul E paul.e.luse at intel.com
Wed Jul 22 19:37:45 UTC 2015


Correct, it by design.  Swift doesn’t expect people to delete things “under the covers”.  When the auditor finds a corrupted file, it’s the one that quantities it and knows that it also needs to invalidate the hashes.pkl file.  This mechanism is there to minimize extra ‘stuff’ going on both at the node and on the cluster when it comes to making sure there is durability in the system.

Wrt why the replication code seems to work if you delete just a .data (again, you shouldn’t do this as files don’t just disappear, the intention is that the auditor is in charge here) is because of some code in the replicator that I didn’t ‘mimic’ in the reconstructor and it doesn’t look like clay did either when he worked on it.  Not really sure it was there – forces a listing every 10 passes for some reason.  Clay? (see do_listdir in update() in the replciator)

Thx
Paul

From: Changbin Liu [mailto:changbin.liu at gmail.com]
Sent: Wednesday, July 22, 2015 12:24 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Openstack] [Swift] Erasure coding reconstructor doesn't work

Thanks, Paul and Clay.

By "deleted one data fragment" I meant I "rm" only the data file. I did not delete the hashes.pkl file in the outer directory.

I tried it again. This time deleting both the data file and the hashes.pkl file. The reconstructor is able to restore the data file correctly.

But now I wonder: is it "by design" that EC does not handle an accidental deletion of just the data file? Deleting both data file and hashes.pkl file is more like a deliberately-created failure case instead of a normal one.  To me Swift EC repairing seems different from the triple-replication mode, where you delete any data file copy, it will be restored.



Thanks

Changbin

On Tue, Jul 21, 2015 at 5:28 PM, Luse, Paul E <paul.e.luse at intel.com<mailto:paul.e.luse at intel.com>> wrote:
I was about to ask that very same thing and, at the same time, if you can indicate if you’ve seen errors in any logs and if so please provide those as well.  I’m hoping you just didn’t delete the hashes.pkl file though ☺

-Paul

From: Clay Gerrard [mailto:clay.gerrard at gmail.com<mailto:clay.gerrard at gmail.com>]
Sent: Tuesday, July 21, 2015 2:22 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Openstack] [Swift] Erasure coding reconstructor doesn't work

How did you "deleted one data fragment"?

Like replication the EC consistency engine uses some sub directory hashing to accelerate replication requests in a consistent system - so if you just rm a file down in an hashdir somewhere you also need to delete the hashes.pkl up in the part dir (or call the invalidate_hash method like PUT, DELETE, POST, and quarantine do)

Every so often someone discusses the idea of having the auditor invalidate a hash after "long enough" or take some action on empty hashdirs (mind the races!) - but its really only an issue when someone delete's something by hand so we normally manage to get distracted with other things.

-Clay

On Tue, Jul 21, 2015 at 1:38 PM, Changbin Liu <changbin.liu at gmail.com<mailto:changbin.liu at gmail.com>> wrote:
Folks,

To test the latest feature of Swift erasure coding, I followed this document (http://docs.openstack.org/developer/swift/overview_erasure_code.html) to deploy a simple cluster. I used Swift 2.3.0.

I am glad that operations like object PUT/GET/DELETE worked fine. I can see that objects were correctly encoded/uploaded and downloaded at proxy and object servers.

However, I noticed that swift-object-reconstructor seemed don't work as expected. Here is my setup: my cluster has three object servers, and I use this policy:

[storage-policy:1]
policy_type = erasure_coding
name = jerasure-rs-vand-2-1
ec_type = jerasure_rs_vand
ec_num_data_fragments = 2
ec_num_parity_fragments = 1
ec_object_segment_size = 1048576

After I uploaded one object, I verified that: there was one data fragment on each of two object servers, and one parity fragment on the third object server. However, when I deleted one data fragment, no matter how long I waited, it never got repaired, i.e., the deleted data fragment was never regenerated by the swift-object-reconstructor process.

My question: is swift-object-reconstructor supposed to be "NOT WORKING" given the current implementation status? Or, is there any configuration I missed in setting up swift-object-reconstructor?

Thanks

Changbin

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150722/05030727/attachment.html>


More information about the OpenStack-dev mailing list