Reading Notes: Azure-LRC
Title: Erasure Coding in Windows Azure Storage
Conference (ATC’12): Link
Journal (): Link
Summary
This paper introduces Windows’s Azure Local Reconstruction Codes (Azure’s
LRCs). LRC reduces the number of erasure coding fragments that need to be read
when reconstructing the failed data fragments, while still keeping the storage
overhead low. The construction is based on local groups, each of which encodes
partial data fragments.
Main Contributions
Details
-
WAS originally stores in 3-way replication, until the data reaches a certain
size, WAS will perform erasure coding on the data. Erasure coding is lazily
performed in the background.
-
Problems: reconstruction takes a long time at the node to read data.
- Azure’s LRCs have k data fragments, l local groups, and r global
parities. n total number of fragments, where n = k + l + r.
- Storage overhead: 1 + (l + r) / k.
- The repair reads fewer data fragments by reading from local groups. But
it introduces more storage overhead, as it stores more local parity
fragments.
- Single fragment failure in a local group: it reads l fragments
(including l - 1 data fragments and one parity fragment).
-
Azure’s LRCs are not MDS codes, where not all multiple failure patterns are
recoverable. For example, for (6,2,2) Azure’s LRCs, if there are four
failures, where 3 data fragments and one parity fragment are all failed,
the data are not recoverable, since the remaining parity fragments can’t be
used to reconstruct the remaining 3 data fragments.
-
Azure’s LRCs achieve Maximally Recoverable Property, which can recover
information-theoretically decodable failure patterns. To elaborate, it
tolerates (r + 1) failures, but it can not tolerate arbitrary (l + r)
failures. For example, (6,2,2) Azure’s LRCs can tolerate arbitrary 3
fragments failure, and it allows all local groups where each group has one
single node failure; but it cannot tolerate arbitrary four failures.
- The goal of Azure’s LRC: how to design coding equations to decode all the
information-theoretically decodable failure patterns.
- LRC coding equations: XOR for local parities; RS for global parity
fragments
- Discuss decodable cases of four failures (not all). The other cases are
skipped.
- Comparing the Pyramid codes, Azure’s LRCs have smaller finite field
sizes.
- An easy method to detect whether the failure pattern is decodable: swap
the local parity fragment to the failed data fragment, and see whether
there are more than r failures for the data + global parities. If
yes, then the failure pattern is not decodable.
- Reliability Analysis
- Markov Reliability Model: the modeling is augmented from standard
modeling, as LRCs have non-decodable failures. The modeling assumes
failures are independent.
- Extension of the modeling to correlated failures: check OSDI’10 (GFS)
- Average single failure repair rate
- multiple repair rate: They make an assumption: in practice, the repair
rates for multiple failures (beyond single failures) are dominated by
the time taken to detect failure and trigger repair. Thus they set the
as 1/T, where T is the time to detect and trigger repair.
- Average repair cost (in number of fragments)
- Decodability ratio: the ratio of decodable failure patterns to all
enumerated failure patterns.
- Storage Overhead vs Reconstruction Cost (Figure. 4): there exists a
tradeoff between the storage overhead and the reconstruction cost.
- Azure reports the number of fault domains as 20.
- Comparisons: LRC lower bound curve vs RS lower bound curve for fault
domain <= 20
- LRC achieves a higher cost and performance trade-off than RS.
- Comparisons: LRC vs other code constructions: HoVer, Stepped
Combination, Weaver and RS codes
- WAS chooses LRC (12, 2, 2) as the example under the 20 fault domains.
- EC is implemented in the stream layer after the blocks are sealed.
Strength
- LRC saves significant I/Os and bandwidth during reconstruction when compared to RS.
Weakness
-
LRCs are non-MDS codes. Additional storage overhead is introduced for the same fault tolerance.
-
LRC is optimized for reconstructing data fragments but not (global) parity fragments. In terms of parity reconstruction, Weaver codes, HoVer codes and Stepped Combination codes can be more efficient.