[Openstack] ask for comments - Light weight Erasure code framework for swift

Hao Wang hao.1.wang at gmail.com
Thu Oct 18 03:28:28 UTC 2012


Hi Sam,

I got some thoughts from your mail. EC is more useful for a centered
storage solution but not for a distributed one. It will bring a heavy load
on internal network traffic.

Actually from network performance's point of view, the 3 copies are also
sort of the result of compromise. There should be a way to combine them
together and tune the parameters under different scenarios. They, however,
would bring different reliability and performance.

Regards,
Howard
On Thu, Oct 18, 2012 at 7:30 AM, Eugene Kirpichov <ekirpichov at gmail.com>wrote:

> Hi Sam,
>
> My five cents.
>
> Using Fountain codes, which are also a class of EC, one can make all
> the blocks equivalent in role (no separation into data and parity
> blocks).
> http://en.wikipedia.org/wiki/Fountain_code
>
> They resolve a few of the issues that you raised, however they may
> raise others - e.g. it's more difficult to determine how many blocks
> you need to fetch to reconstruct the data.
>
> On Wed, Oct 17, 2012 at 4:24 PM, Samuel Merritt <sam at swiftstack.com>
> wrote:
> > On 10/15/12 5:36 PM, Duan, Jiangang wrote:
> >>
> >> Some of our customers are interested in Erasure code than tri-replicate
> to
> >> save disk space.
> >> We propose a BP "Light weight Erasure code framework for swift", which
> can
> >> be found here https://blueprints.launchpad.net/swift/+spec/swift-ec
> >> The general idea is to have some daemon on storage node to do offline
> scan
> >> - select code object with big enough size to do EC.
> >>
> >> Will glad to hear any feedback on this.
> >
> >
> > Here, in no particular order, are some thoughts I have.
> >
> > - Object blocks (both data blocks and parity blocks) will need to be
> marked
> > somehow so that 3 replicas of each block aren't kept. This is a pretty
> > fundamental change to Swift; up until now, all objects are treated the
> same.
> > It's essentially introducing the notion of tiered storage into Swift.
> >
> > - Who's responsible for ensuring the presence of all the blocks? That is,
> > assume you have an object that's been split into ten data blocks (D1, D2,
> > ..., D10) and 2 parity blocks (P1, P2). The drive with D7 on it dies.
> Which
> > replicator(s) is(are) responsible for rebuilding D7 and storing it on a
> > handoff node?
> >
> > If you have the replicators on each block's machine checking for
> failures,
> > then you'll wind up with more people checking each replica. Here, it
> would
> > be 11 replicators ensuring that each block is present. Compare that to
> the
> > full-replication case, where there are 2 replicators checking on it.
> That's
> > going to result in more traffic on the internal network.
> >
> > - There will need to be throttles on the transformation daemons (replica
> ->
> > EC and vice versa), as that's very IO intensive. If a big bunch of data
> is
> > uploaded at one time and then not accessed (think large backups), then
> that
> > could be a ticking time bomb for my cluster performance. After those
> objects
> > become "cold", the transformation daemons will thrash my disks and
> network
> > turning them into EC-type objects.
> >
> > - Does this open up a Swift cluster to a DoS attack? If my objects are
> > stored w/EC, then can someone go through and request a few bytes from
> each
> > object in my cluster a few times and cause all my objects to get "hot"?
> > Under the proposed scheme, this would turn my objects from EC-storage to
> > replica-storage, filling up my disks and killing my cluster. To mitigate
> > that, I'd have to keep enough disk around to hold 3 replicas of
> everything,
> > and at that point, I may as well just keep the 3 replicas.
> >
> > - Another thought for a resource-consumption attack: can someone slowly
> walk
> > my objects and make a large fraction (say, 5%) of them hot each day? That
> > seems like it would make the transformation daemons run at maximum
> capacity
> > all the time trying to keep up.
> >
> > - Retrieval of EC-stored objects becomes more failure-prone. With
> > replica-stored objects, 1 out of 3 object servers has to be available
> for a
> > GET request to work. With EC-stored objects and a 10:2 coding, 10 out of
> 12
> > object servers have to be available. That makes network partitions much
> > worse for data availability.
> >
> > - EC-storage is at odds with geographic replication. Of course, Swift
> > supports neither one today. However, with geographic replication, one
> wants
> > to have a local replica of each each object in each geographic region,
> which
> > results in more copies for lower latency. With EC-storage, less data is
> > stored. When they're combined, the result is a whole lot of traffic
> across
> > slow, expensive WAN links.
> >
> > - Recombining EC-stored object chunks is going to chew up a ton more CPU
> on
> > either the object or proxy servers, depending on which one does it. If
> the
> > proxy, then it'll add more to an already CPU-heavy workload. If the
> object
> > server, then it'll make using big storage boxes less practical (like one
> of
> > the 48-drives-in-4U servers one can buy).
> >
> > - Can one change the EC-coding level? That is, if I'm using 10:2 coding
> (so
> > each object turns into 10 data blocks and 2 parity blocks), can I change
> > that later? Will that have massive performance impacts on my cluster as
> more
> > data blocks are computed?
> >
> > It may be that this is like changing the replica count, and the answer is
> > "yes, but your cluster will thrash for a long time after you do it".
> >
> > - Where's the original checksum stored? Clearly, each block will have its
> > own checksum for the auditors to use. However, if a client issues a
> request
> > like "HEAD /a/c/o", that'll contain the checksum of the original file.
> Does
> > that live somewhere, or will the proxy have to read all the bytes and
> > determine the checksum?
> >
> > - I wonder what effect this will have on internal-network traffic. With a
> > replica-stored object, the proxy opens one connection to an object
> server,
> > sends a request, gets a response, and streams the bytes out to the
> client.
> >
> > With an EC-stored object, the proxy has to open connections to, say, 10
> > different object servers. Further, if one of the data blocks is
> unavailable
> > (say data block 5), then the proxy has to go ahead and re-request all the
> > data blocks plus a parity block so that it can fill in the gaps. That
> may be
> > a significant increase in traffic on Swift's internal network. Further,
> by
> > using such a large number of connections, it considerably increases the
> > probability of a connection failure, which would mean more client
> requests
> > would fail with truncated downloads.
> >
> >
> > Those are all the thoughts I have right now that are coherent enough to
> put
> > into text. Clearly, adding erasure coding (or any other form of tiered
> > storage) to Swift is not something undertaken lightly.
> >
> > Hope this helps.
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~openstack
> > Post to     : openstack at lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~openstack
> > More help   : https://help.launchpad.net/ListHelp
>
>
>
> --
> Eugene Kirpichov
> http://www.linkedin.com/in/eugenekirpichov
> We're hiring! http://tinyurl.com/mirantis-openstack-engineer
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20121018/46eb66e5/attachment.html>


More information about the Openstack mailing list