Opening the Chrysalis: On the Real Repair Performance of MSR Codes
This paper introduces Butterfly codes, systematic code with optimal repair I/O. By carefully integrating the code into districuted system, Butterfly codes achieves theoretically optimal repair performance of MSR codes.
The implementation uses a newer reconstruction of the code, thus allows simpler implementation.
Encoder. The sub-boolean matrix A and B are constructed recursively with input data vector codes
Decoder. It proves to allows 2 nodes failure. 4 samples are illustrated.
One data column is lost. The first D0 and remaining D1-Dk-1
H and B parity column
HDFS implementation. Details are of the implementation, including communication, memory management not listed here.
Ceph implementation as a plug-in.
From HDFS and Ceph’s test over AWS EC2, it shows:
increased repair throughput compared with RS.
MSR codes over GF(2) achieve low CPU usage. But some params like stripe size affects the performance.
With careful implementation, MSR codes reduces the repair traffic by 2x to traditional erasure codes.
N/A