[Openstack] [Swift] Delete handling with md5 collisions

Samuel Merritt sam at swiftstack.com
Wed Aug 26 21:55:19 UTC 2015


On 8/26/15 1:37 PM, Shrinand Javadekar wrote:
> Hi,
>
> I have a question about how object deletes are handled with md5
> collisions. I looked at the code and here's my understanding of how
> things will work.
>
> If I have two objects that have the same md5 hash, they will go to the
> same hash directory. Say, they go to
> /srv/node/r1/object/1024/eef/deadbeef/t1.data and
> /srv/node/r1/object/1024/eef/deadbeef/t2.data.

That's two objects whose *names* have the same MD5 hash. The objects' 
contents are irrelevant when determining placement.

> Now, if I delete object t1, Swift will created a new file called t3.ts
> and put it in the hash directory.
> /srv/node/r1/object/1024/eef/deadbeef/t3.ts.
>
> When the replicator runs, it will delete all files with timestamp less
> than t3. So will it delete both t1 and t2?

Correct. Two objects whose names have the same MD5 hash are considered 
equivalent by Swift.

If I remember correctly, since MD5 has a 128-bit output, that means you 
have a 50% probability of having a collision once your cluster reaches 
2^64 objects.




More information about the Openstack mailing list