Keyun Cheng

A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers

Download

SIGCOMM, 2014

Summary

This paper introduces Hitchhiker, an EC storage system that with no additional storage but achieves 25% network traffic saving and 45% disk I/O. The implementation in HDFS shows 35% reduction in network traffic, 36% reduction in computation time and 32% data read time during reconstruction.

Details

  1. Three versions of Hitchhiker are introduced.

Encode: divide the stripe to 2 sub-stripes a and b. For each, different calculation of parity is introduced.

Decode: three steps to get sub-stripe b and then a.

Encode: One additional XOR compared with Hitchhiker-XOR.

Decode: Similar to Hitchhiker-XOR.

This method requires underlying RS to have all-XOR-parity property.

Not fully XORed, but free the restriction of underlying RS to have all-XOR-parity property.

Encode: All XORs.

Decode: 1 additional multiplication in step 3.

  1. Hop and Couple feature in Disk for efficiency. This technique aims to minimize the degree of discontinuity in disk reads during the reconstruction of data units. The hop-and-couple technique couples a byte with another byte within the same unit that is a certain distance ahead. the hop distance can be greater than 1. Coupled bytes are encoded together.

  2. Evaluation. (n, k) = (10, 4), block size: 256MB, buffer size: 1MB. The metric is time for processing a block.

  1. Tradeoffs

Strength

Weakness