Keyun Cheng

Opening the Chrysalis: On the Real Repair Performance of MSR Codes

Download

FAST, 2016

Summary

This paper introduces Butterfly codes, systematic code with optimal repair I/O. By carefully integrating the code into districuted system, Butterfly codes achieves theoretically optimal repair performance of MSR codes.

Details

The implementation uses a newer reconstruction of the code, thus allows simpler implementation.

  1. Encoder. The sub-boolean matrix A and B are constructed recursively with input data vector codes

  2. Decoder. It proves to allows 2 nodes failure. 4 samples are illustrated.

  1. HDFS implementation. Details are of the implementation, including communication, memory management not listed here.

  2. Ceph implementation as a plug-in.

Strength

From HDFS and Ceph’s test over AWS EC2, it shows:

  1. increased repair throughput compared with RS.

  2. MSR codes over GF(2) achieve low CPU usage. But some params like stripe size affects the performance.

  3. With careful implementation, MSR codes reduces the repair traffic by 2x to traditional erasure codes.

Weakness

N/A