Title: Explore Data Placement Algorithm for Balanced Recovery Load Distribution
Conference (ATC’23): Link
Journal (): Link
This paper formulates the EC full-node recovery load balancing problem with a weighted repair load graph, and proves that solving the problem (finding an optimal data placement with minimum recovery load) is NP-hard by reducing the problem to a maximum independent set problem in polynomial time. It then proposes a greedy algorithm to generate data placement with balanced recovery load distribution. Evaluation compares the greedy data placement versus the random data placement and shows the effectiveness of the solutions.
When new data comes, Algorithm 1 is repeatedly called for each new stripe to reduce the maximum load of full-node recovery
The most important weakness is that the recovery load matrix for each stripe is fixed. Even though it’s OK for RS code (with n - k = 1) and Clay code (with d = n - 1), in general, it’s not reasonable for a lot of codes, where there are numerous repair load matrices. The assumption is that it picks one of the candidate repair load matrices, and uses it for all stripes. Actually this assumption overlooks the MDS property, where collecting any k of n blocks are able to reconstruct the data.
It considers the data are iteratively added to the system. Whether the algorithm is capable of adjusting existing data placement to a recovery load -balanced placement worth exploring.