Reading Notes: OSDI’20 Pacemaker
Title: PACEMAKER: Avoiding HeART attacks in storage clusters with
disk-adaptive redundancy
Conference (OSDI’20):
Link
Journal (): Link
Summary
This paper presents Pacemaker to reduce the transition overload (in our
context, redundancy transition overhead instead) in large scale clusters by
proactively (1) organizing data layouts, (2) initializing transitions without
affecting redundancy. It first analyzes traces from millions of disks from
large production cluster to show that transition overload strongly blocks the
cluster’s performance. Pacemaker is integrated to HDFS and experiments from
production clusters show that the transition overhead significantly reduces to
no more than 5% while providing storage savings for 14% to 20%.
Main Contributions
Details
- Problem settings (almost the same as redundancy transition)
- Disk failure rates highly vary among storage clusters -> conservative
redundancy (MTTDL analysis)
- high reconstruction cost for wide stripes with the same redundancy
- AFR (Annual disk Failure Rate) has to be learned from observations
- Transition overload is high (from previous work HeART)
- Overwhelming burst of urgent transition I/O -> periods of 100%
bandwidth
- Existing studies
- Reactive approach: exists a period of time that the data is unprotected
- Pacemaker address transition overload issue by
- proactively organize stripe layouts
- proactively (“savely”) initializing transitions without hurting the
redundancy
- integrate to HDFS to demonstrate the feasibility
- Designs
- To meet redundancy constraints:
- Dgroup (same makes/models, fixed) and Rgroup (same redundancy
schemes and placement restrictions)
- Allows disks transition from Rgroups
- To meet I/O constraints
- Reconstruction I/O
- Transition I/O (most important)
- Proactive-transition-initiator
- using AFR curves (rates) and learn the rate to determine the time to
transit
- Disk health monotoring service and AFR curve learners
- Special handling for trickle disks and step disks
- Rgroup planner: which group should be transition to with constraints
(redundancy, predefined I/O constraints)
- Rgroup creation and purging: meets system required placement
restrictions
- Rate-limitor: limit the transition rate
- Transition-executors
- transition by emptying disk (simply moving contents to other disks),
for small # of disks
- Recalculating parities: redundancy transition techniques(what we
are focusing on)
- Implementation on HDFS
- Feasibility and low implementation overhead
Strength
- Improve from HeART: improve the transition overhead from system’s
perspective
- Strong proof by real data from large clusters and disks
- Problem clear and strong reasoning (analysis of disk failure rates, peak
I/Os)
- How they solve problems from system’s perspective (given how to
transition with codes in theory)
- Limit transition I/O with Rgroups, monitoring disk health, rate
limiting, etc.
- Strong evidence from experiments: 4 clusters
- Proof of concept from HDFS
Weakness
-
This paper mostly focus on how to solve transition overhead almost from
system’s perspective. It’s really a good paper to read. The underlying
assumption is the transition overhead is addressed from theory perspective.
They didn’t talk about it in this paper. How to select the codes in Sec. 5.3
may worth mentioning.
-
The experiment results are mostly based on simluation with cluster logs. I’m
interested how the proactive approach will perform in real clusters.