Partial-Parallel-Repair (PPR): A Distributed Technique for Repairing Erasure Coded Storage
PPR (partial parallel repair) divides the reconstruction to sub-operations and schedule to nodes. Then a protocol is used to combines the partial results. This approach reduces the network pressure, experiments shows that this approach significantly reduces repair/degraded read time. LRC and Rotated RS are overlayed by PPR in this paper.
PPR divides reconstruction into several partial parallel repair operations that are performed simultaneously at multiple servers.
PPR can be overlaied on top of almost all ECs.
Main PPR algorithm:
PPR takes a few logical timesteps to complete the reconstruction operation. In each time slop, a set of servers do partial reconstruction. It overlays a tree-like reconstruction structure.
PPR distributes the reconstruction to multiple servers in parallel.
In-memory chunk caching. No disk hitting is required if it’s cached in memory for some chunks.
The larger chunk sizes, the better performance main PPR gains.
Performance improvement over degraded reads.
Computation efficient by parallism.