[Openstack] Enabling data deduplication on Swift

Caitlin Bestler Caitlin.Bestler at nexenta.com
Mon Mar 12 16:46:22 UTC 2012



Andi abes asked: 

> Doesn't that depend on the ratios of read vs write?
> In a read tilted environment (e.g. CDN's, image stores etc), being able to dedup at the block level in the
> relatively rare write case seems a boon. The simplification this could allow - performing localized dedup
> (i.e. each object server deduping just its local storage) seems worth while.

For the most part deduplication has no impact on read performance. The same chunks will be fetched
whether they were de-duplicated or not.

If you have a central metadata system (like GFS or HDFS) then deduplication can impair optimizing the location
of the chunks for streaming reads. But with hash driven algorithms you either place the entire object on one
server, which will preclude parallelizing the fetch, or you distribute the object's chunks to multiple servers 
which will impair the efficiency of a slow streaming read.

Because distributed deduplication relies on fingerprinted chunks it has the advantage of allowing unrestricted
Chunk caching, which is the real solution to optimizing reads of extremely popular data.
 




More information about the Openstack mailing list