Fast Predictive Repair in Erasure-Coded Storage
Download
DSN, 2019
Summary
This paper presents a predictive repair pipelining techniques called FastPR for districuted storage systems with erasure coding. It carefully couples the migration and reconstruction of the chunks of the STF (Soon-To-Fail). Two repair scenarios scatter-repair and hot-standby-repair are mainly addressed. Parallization in migration and reconstruction results in 30% repair speed improvement over reactive repair.
Details
- Collect disk status from SMART
- Workflow
- Migration: collect chunks from STF nodes to new healthy nodes
- Reconstruction: follows reactive repair. Reconstruct chunks from STF nodes.
- Theoretical prove of speed improvement.
- Finding reconstruction set, and schedule repair.
- System architecture: Coordinator over Agents.
Strength
- 30% speed improvement over reactive repair.
- Easily depolyable on HDFS without changing HDFS code base
Weakness