Repair Pipelining for Erasure-Coded Storage
This paper presents a pipeling repair approach for EC. It’s by pipelining the repair of failed data in small-sized units across storage nodes. Repair time reduced approximately to normal read time to the same amount of data in similar environment.
Comparison: Conventional RS (log(k)). and PPR (log2(k+1)). RS: helpers not utilized. PPR: still not balanced between requestors and helpers.
Goal: Further minimize repair time for transient failures. Designed for single block failure per stripe. For multiple blocks failure in a stripe, go back to the conventional one.
Approach: breaks a block’s repair to a set of slices repair, parallize the slices repair. Overhead introduced by issuing many slices per block are discussed. The overall pipeline performance is bounded by the worse link/helper.
Full Node recovery: greedy approach for the selection of helpers, that is selecting the least selected helpers to do recovery.
Heterogeneity: optimal path selection for repair paths. Algorithm introduced: Weighted path selection