Keyun Cheng

RAIDP: replication with intra-disk parity

Download

EuroSys, 2020

Summary

Note: I can’t find the pdf of paper public online. I refer to the slides from Eurosys’20 and the slides from Jingwei’s github

This work summaries how to quickly recover from two simutaneous disk failures, e.g. 3-replica without restoring to the third replica for warm data. It introduces RAID-P, with intra-disk parity. It saves 33% storage spaces compared with 3-replica but with better performance while writing, also saves network bandwidth for writes. The performance of read is not really affected.

Details

Super chunk’s distribution. Divide each disk to N-1 super chunks. Any two disks shares a superchunk’s copy. No same superchunk’s are saved on each disk. Additional disk Lstor is used to save parity of each superchunk.
Superchunk should be 2-replicated. Lstor fails separatedly, and needs to be cheap and fast.
Assumes one/two disks failes, use remaining chunks to recover, since there should be at least one replica from existing nodes.
3K LoCs in Hadoop, Lstors are simulated in memory.

Strength

Similar fault tolerance with 3-way replica
up to 33% write performance improvement when writing new data compared with HDFS.
33% less storage, since it’s indeed 2-way replica for each superchunk.
Recovery is efficient thant EC.
Memory simulated Lstor.

Weakness

I need to look into the detail of the thesis to find it’s weakness.