A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers
This paper introduces Hitchhiker, an EC storage system that with no additional storage but achieves 25% network traffic saving and 45% disk I/O. The implementation in HDFS shows 35% reduction in network traffic, 36% reduction in computation time and 32% data read time during reconstruction.
Encode: divide the stripe to 2 sub-stripes a and b. For each, different calculation of parity is introduced.
Decode: three steps to get sub-stripe b and then a.
Encode: One additional XOR compared with Hitchhiker-XOR.
Decode: Similar to Hitchhiker-XOR.
This method requires underlying RS to have all-XOR-parity property.
Not fully XORed, but free the restriction of underlying RS to have all-XOR-parity property.
Encode: All XORs.
Decode: 1 additional multiplication in step 3.
Hop and Couple feature in Disk for efficiency. This technique aims to minimize the degree of discontinuity in disk reads during the reconstruction of data units. The hop-and-couple technique couples a byte with another byte within the same unit that is a certain distance ahead. the hop distance can be greater than 1. Coupled bytes are encoded together.
Evaluation. (n, k) = (10, 4), block size: 256MB, buffer size: 1MB. The metric is time for processing a block.
Computation time: For any data block, Hitchhiker shows faster reconstruction than RS.
Read time for degraded reads. Hitchhiker reads from more machines, but read half for most of the machines, thus shows less read latency.
Connecting to more than k machines introduces potential read latency.
Choice of hop-length. It may introduces reconstruction of unnecessary coupled bytes during reconstruction of a byte.
72.1% higher encoding time, since for each stripe, nearly two times for sub-stripes. To improve other metrics, Hitchhiker sacrifies encoding time.
No additional storage introduced.
Better repair performance (disk I/O and repair traffic).