About
My reading list includes papers, articles, books, tutorials, videos, etc. for
research purposes. Items are characterized by their topics/keywords.
Table of Contents
Recent
LLM Serving
Categories
Erasure Coding
Erasure Coding (basics)
Venue |
Title |
Link / Summary |
Brief |
Summary |
Concepts that must know |
Summary |
EC basic concepts and keywords |
Manuscript |
An Introduction to Galois Fields and Reed-Solomon Coding |
Link |
Intro to Finite Field and RS code (communication) in Clemenson Univ. |
Manuscript |
Reed-Solomon Codes |
Link |
Intro to RS codes from Duke Univ. |
USENIX Login'13 |
Erasure Codes for Storage Systems: A Brief Primer |
Summary |
Plank EC basics |
FAST Tutorial'13 |
Tutorial: Erasure Coding for Storage Systems |
Summary |
Plank, EC tutorial |
FAST'09 |
A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries For Storage |
Summary |
Plank, EC computation evaluation |
Network Coding (basics)
Venue |
Title |
Link / Summary |
Brief |
FAST'11 Poster |
Repairing Erasure Codes |
Link |
Network coding for storage (poster) |
IEEE Survey'11 |
A Survey on Network Codes |
Summary |
Network coding for storage (survey) |
TIT'10 |
Network Coding for Distributed Storage Systems |
Summary |
Network coding for storage video, report |
PPT |
Regenerating codes for distributed storage |
Link |
Network Coding, intro |
Erasure Codes
Venue |
Title |
Link / Summary |
Brief |
FAST'23 |
Practical Design Considerations for Wide Locally Recoverable Codes (LRCs) |
Summary |
Uniform Cauchy LRC, wide stripe, LRC |
SRDS'22 |
XHR-Code: An Efficient Wide Stripe Erasure Code to Reduce Cross-Rack Overhead in Cloud Storage Systems |
Summary |
XHR-Code, repair, wide stripe, hierarchical settings, multiple failures |
MSST'19 |
AZ-Code: An Efficient Availability Zone Level Erasure Code to Provide High Fault Tolerance in Cloud Storage Systems |
Link |
AZ-Code |
ISIT'18 |
Codes with Combined Locality and Regeneration Having Optimal Rate, dmin and Linear Field Size |
Link |
Local Regenerating Codes, LRC, regenerating codes |
DSN'18 |
Alpha Entanglement Codes: Practical Erasure Codes to Archive Data in Unreliable Environments |
Link |
Alpha Entanglement Codes, multiple failures |
ATC'18 |
On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes |
Summary |
LRC, comparison, Ceph |
FAST'18 |
RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures |
Link |
RAID+, load balancing |
FAST’18 |
Clay Codes: Moulding MDS Codes to Yield an MSR Code |
Summary |
Clay codes, MSR codes |
TIT'17 |
Explicit constructions of high-rate MDS array codes with optimal repair bandwidth |
Link |
Ye-Barg codes, MSR codes |
ISIT'16 |
Double Regenerating Codes for hierarchical data centers |
Link |
DRC, MSR codes, hierarchical settings |
STOC'16 |
Repairing Reed-solomon codes |
Link |
RS codes, repair, sub-packetization |
FAST’16 |
Opening the Chrysalis: On the Real Repair Performance of MSR Codes |
Summary |
Butterfly codes, MSR codes |
FAST'15 |
Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth |
Summary |
PM-RBT codes, MSR codes |
TOS'14 |
Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems |
Link |
Sector-Disk (SD) codes, sector-disk failures |
TIT'14 |
A family of optimal locally recoverable codes |
Summary |
Optimal LRCs, LRC |
TIT'14 |
Locally Repairable Codes |
Link |
LRC |
TIT'14 |
Codes With Local Regeneration and Erasure Correction |
Summary |
Local Regenerating Codes, LRC, multiple failures |
TIT'14 |
Repair locality with multiple erasure tolerance |
Link |
LRC, multiple failures |
SIGCOMM’14 |
A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers |
Summary |
Hitchhikker codes, regenerating codes, piggybacking codes |
FAST'14 |
STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems |
Summary |
STAIR Codes, sector-disk failures |
PVLDB'13 |
XORing Elephants: Novel Erasure Codes for Big Data |
Summary |
Xorbas codes, LRC |
HotStorage'13 |
A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster |
Link |
Piggybacking codes |
TIT'13 |
Zigzag Codes: MDS Array Codes With Optimal Rebuilding |
Link |
Zigzag Codes, regenerating codes |
ISIT'13, TIT'17 |
A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes |
Link |
Piggybacking codes |
TOS'12 |
Generalized X-code: An efficient RAID-6 code for arbitrary size of disk array |
Summary |
Generalized X-codes |
TIT'12 |
On the Locality of Codeword Symbols |
Link |
Theory of LRCs |
ATC'12 |
Erasure Coding in Windows Azure Storage |
Summary |
Azure-LRC |
INFOCOM'12 |
Simple regenerating codes: Network coding for cloud storage |
Link |
Simple regenerating code |
ISIT'10, TIT'11 |
Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction |
Link |
Product-Matrix Code |
TOS'09 |
GRID codes: Strip-based erasure codes with high fault tolerance for storage systems |
Summary |
GRID codes |
NCA'07 |
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems |
Summary, Summary (older, for TOS'13) |
Pyramid Codes, LRC |
FAST'04 |
Improving Storage System Availability with D-GRAID |
Link |
D-GRAID codes, RAID |
FAST'04 |
Row-Diagonal Parity for Double Disk Failure Correction |
Link |
RDP codes, array codes, RAID |
ATC'1996 |
AFRAID - A Frequently Redundant Array of Independent Disks |
Link |
AFRAID, RAID |
ISCA'1994, TC'1995 |
EVENODD: an optimal scheme for tolerating double disk failures in RAID architectures |
Link |
EVENODD codes, array codes, RAID |
SIGMOD'1988 |
A Case for Redundant Arrays of Inexpensive Disks (RAID) |
Link |
RAID |
SIGMETRICS Perf Eval. Review'1995 |
Striping in a RAID level 5 disk array |
Link |
RAID striping, RAID |
SIAM'1960 |
Polynomial Codes Over Certain Finite Fields |
Summary |
RS codes (the original version) |
Monograph from Prof. P. Vijay Kumar |
Codes for Distributed Storage |
Link |
EC theory basics and survey (including RS, MSR, LRC, etc.) |
Redundancy Transitioning
Venue |
Title |
Link / Summary |
Brief |
ISIT'23 |
Locally Repairable Convertible Codes: Erasure Codes for Efficient Repair and Conversion |
Summary |
LRC conversion, code conversion, LRC |
OSDI'22 |
Tiger: disk-adaptive redundancy without placement restrictions |
Summary |
Tiger, redundancy transitioning, disk heterogeneity |
ISIT'22 |
Bandwidth Cost of Code Conversions in the Split Regime |
Link |
Convertible codes: bandwidth, code conversion, theory |
ISIT'21, TIT'23 |
Bandwidth Cost of Code Conversions in Distributed Storage: Fundamental Limits and Optimal Constructions |
Link |
Convertible codes: bandwidth, code conversion, theory |
INFOCOM'22 |
Optimal Data Placement for Stripe Merging in Locally Repairable Codes |
Summary |
LRC stripe merging, code conversion, LRC |
ICDCS'21 |
StripeMerge: Efficient Wide-Stripe Generation for Large-Scale Erasure-Coded Storage |
Summary |
StripeMerge, wide stripe, code conversion |
OSDI'20 |
Pacemaker: avoiding HeART attacks in storage clusters with disk-adaptive redundancy |
Summary |
PACEMAKER, redundancy transitioning, disk heterogeneity |
SRDS'20 |
Enabling I/O-Efficient Redundancy Transitioning in Erasure-Coded KV Stores via Elastic Reed-Solomon Codes |
Summary |
Elastic Reed-Solomon (ERS) codes, redundancy trasntioning |
INFOCOM'20 |
On the Optimal Repair-Scaling Trade-off in Locally Repairable Codes |
Summary |
LRC Repair-Scaling Tradeoff, redundancy transitioning, LRC |
IEEE Access'20 |
Efficient Storage Scaling for MBR and MSR Codes |
Summary |
MSR codes, scaling |
ITCS'20, TIT'22 |
Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage |
Summary |
Convertible Codes: I/O, code conversion |
ISIT'20 |
Access-optimal Linear MDS Convertible Codes for All Parameters |
Summary |
Access-optimal Convertible Codes |
FAST'19 |
Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity |
Summary |
HeART, disk heterogeneity, redundancy transitioning |
ISIT'18 |
Generalized Optimal Storage Scaling via Network Coding |
Summary |
Network coding, scaling |
INFOCOM'18, TPDS'22 |
Toward Optimal Storage Scaling via Network Coding: From Theory to Practice |
Summary |
NCScale, scaling, network coding |
TPDS'16 |
I/O-Efficient Scaling Schemes for Distributed Storage Systems with CRS Codes |
Link |
CRS, scaling |
DSN'15, TPDS'17 |
Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems |
Link |
Replication to EC, redundancy transitioning |
FAST'15 |
A Tale of Two Erasure Codes in HDFS |
Summary |
HACFS, redundancy transitioning |
TC'15 |
Accelerate RDP RAID-6 Scaling by Reducing Disk I/Os and XOR Operations |
Link |
RAID, scaling |
TPDS'14 |
An Efficient Scaling Scheme for RS-Coded Storage Clusters |
Summary |
Scale-RS, scaling |
ICPP'12 |
GSR: A Global Stripe-Based Redistribution Approach to Accelerate RAID-5 Scaling |
Link |
GSR, RAID, scaling (C. Wu) |
FAST'11 |
Accelerate RAID Scaling by Minimizing Data Migration |
Link |
FastScale, RAID, scaling |
TOCS'1996 |
The HP AutoRAID Hierarchical Storage System |
Link |
AutoRAID, replication to RAID |
Erasure Coding Reliability Analysis
Venue |
Title |
Link / Summary |
Brief |
SRDS'17, TPDS'19 |
SimEDC: A Simulator for the Reliability Analysis of Erasure-Coded Data Centers |
Link |
SimEDC |
HotStorage'10 |
Mean time to meaningless: MTTDL, Markov models, and storage system reliability |
Link |
MTTDL Meaningless |
OSDI'09 |
Availability in Globally Distributed Storage Systems |
Summary |
Google Availability |
I2TS'08 |
When MTTDLs Are Not Good Enough: Providing Better Estimates of Disk Array Reliability |
Link |
Calculation of MTTDL (1) |
SNAPI'07 |
Outshining Mirrors: MTTDL of Fixed-Order SSPiRAL Layouts |
Link |
Calculation of MTTDL (2) |
Techniques for Erasure Coding
Venue |
Title |
Link / Summary |
Brief |
ATC'23 |
Explore Data Placement Algorithm for Balanced Recovery Load Distribution |
Summary |
Recovery, data placement |
IPDPS'23 |
Boosting Multi-Block Repair in Cloud Storage Systems with Wide-Stripe Erasure Coding |
Summary |
Multiple repair, wide stripe |
ICPP'23 |
Toward Optimal Repair and Load Balance in Locally Repairable Codes |
Summary |
LRC, repair, load balancing |
ICDCS'22 |
PivotRepair: Fast Pipelined Repair for Erasure-Coded Hot Storage |
Link |
repair |
ICPP'22 |
Exploiting Parallelism of Disk Failure Recovery via Partial Stripe Repair for an Erasure-Coded High-Density Storage Server |
Link |
repair, high density storage |
ICPP'22 |
Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance |
Summary |
LRC, repair, data placement |
ATC'21 |
Boosting Full-Node Repair in Erasure-Coded Storage |
Summary |
RepairBoost, full-node recovery |
SOSP'21 |
Geometric Partitioning: Explore the Boundary of Optimal Erasure Code Repair |
Link |
Geometric Partitioning |
FAST'21 |
Exploiting Combined Locality for Wide-Stripe Erasure Coding in Distributed Storage |
Summary, Summary (earlier) |
ECWide, repair, LRC, wide stripe |
ICPP'21 |
Multi-level Forwarding and Scheduling Repair Technique in Heterogeneous Network for Erasure-coded Clusters |
Link |
repair, heterogeneous |
IWQoS'21 |
EC-Scheduler: A Load-Balanced Scheduler to Accelerate the Straggler Recovery for Erasure Coded Storage Systems |
Summary |
repair, load balancing |
IPDPS'20 |
EC-Fusion: An Efficient Hybrid Erasure Coding Framework to Improve Both Application and Recovery Performance in Cloud Storage Systems |
Link |
EC-Fusion, multiple erasure codes |
HotStorage'20 |
SelectiveEC: Selective Reconstruction in Erasure-coded Storage Systems |
Summary |
SelectiveEC, load balancing |
Eurosys'20 |
RAIDP: replication with intra-disk parity |
Summary |
RAID-P |
FAST'20 |
CRaft: An Erasure-coding-supported Version of Raft for Reducing Storage Cost and Network Cost |
Link |
CRaft |
FAST'19 |
Fast Erasure Coding for Data Storage: A Comprehensive Study of the Acceleration Techniques |
Summary |
repair acceleration |
DSN'19 |
Fast Predictive Repair in Erasure-Coded Storage |
Summary |
FastPR, repair, parallelization |
ICPP'19 |
Fast Recovery Techniques for Erasure-coded Clusters in Non-uniform Traffic Network |
Link |
multiple failure repair |
ATC'17 |
Repair Pipelining for Erasure-Coded Storage |
Summary |
ECPipe, repair, parallelization |
ATC'17 |
PARIX: Speculative Partial Writes in Erasure-Coded Systems |
Link |
Parix |
Eurosys'16 |
Partial-Parallel-Repair (PPR): A Distributed Technique for Repairing Erasure Coded Storage |
Summary |
PPR, repair, parallelization |
MSST'13 |
CORE: Augmenting Regenerating-Coding-Based Recovery for Single and Concurrent Failures in Distributed Storage Systems |
Link |
CORE, repair, mutli-failure |
SYSTOR'14 |
Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage |
Link |
Lazy recovery |
TC'14 |
Boosting Degraded Reads in Heterogeneous Erasure-Coded Storage Systems |
Summary |
degraded read, heterogeneous network |
FAST'14 |
Parity Logging with Reserved Space: Towards Efficient Updates and Recovery in Erasure-coded Clustered Storage |
Link |
CodFS |
MSST’12 |
On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice |
Summary |
Zhu, replace recovery algorithms for XOR based codes |
FAST'12 |
Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads |
Summary |
Khan, RotatedRS, repair I/O improvement |
Erasure-coded Systems
Venue |
Title |
Link / Summary |
Brief |
NSDI'22 |
C2DN: How to Harness Erasure Codes at the Edge |
|
|
for Efficient Content Delivery |
Summary |
C2DN |
|
FAST'22 |
Hydra : Resilient and Highly Available Remote Memory |
Link |
Hydra, RDMA |
FAST'22 |
DEPART: Replica Decoupling for Distributed Key-Value Storage |
Link |
DEPART, distributed KVStore, EC |
NSDI'20 |
Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed Storage |
Summary |
PANDO, consensus, EC |
SC'20 |
INEC: Fast and Coherent In-Network Erasure Coding |
Link |
INEC, RDMA |
SC'19 |
TriEC: tripartite graph based erasure coding NIC offload |
Link |
TriEC, RDMA |
SoCC'19 |
Coupling Decentralized Key-Value Stores with Erasure Coding |
Summary |
ECHash, KVStore |
HPDC'19 |
UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems |
Link |
UMR-EC, RDMA |
FAST'19 |
OpenEC: Toward Unified and Configurable Erasure Coding Management in Distributed Storage Systems |
Summary |
OpenEC |
ICDCS'17 |
High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads |
Link |
RDMA |
ATC'17 |
Giza: Erasure Coding Objects across Global Data Centers |
Link |
Giza, consensus |
FAST'16 |
Efficient and Available In-memory KV-Store with Hybrid Erasure Coding and Replication |
Link |
Cocytus, KVStore |
OSDI'16 |
EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding |
Summary |
EC-Cache |
OSDI'14 |
Pelican: A Building Block for Exascale Cold Data Storage |
Summary |
Pelican, cold DSS |
FAST'12 |
NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds |
Summary |
NCCloud, network coding |
Miscellaneous
Venue |
Title |
Link / Summary |
Brief |
IPTPS'02 |
Erasure coding vs. replication:a quantitative comparison |
Link |
EC vs replication |
Storage Systems and Cloud
Venue |
Title |
Link / Summary |
Brief |
ATC'19 |
Dayu: Fast and Low-interference Data Recovery in Very-large Storage Systems |
Link |
Dayu, recovery |
SYSTOR'19 |
Kurma: Secure Geo-Distributed Multi-Cloud Storage Gateways |
Summary |
Kurma |
ATC'14 |
SCFS: A Shared Cloud-backed File System |
Summary |
SCFS, Depsky extension |
SoCC'14 |
Hybris: Robust Hybrid Cloud Storage |
Summary |
Hybris |
SOSP'13 |
SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services |
Summary |
SPANStore |
OSDI'12 |
Flat Datacenter Storage |
Link |
Flat Datacenter Storage |
Eurosys'11 |
DEPSKY: A High-Availability and Integrity Layer for Cloud Storage |
Summary |
Depsky |
SoCC'10 |
RACS: a case for cloud storage diversity |
Summary |
RACS |
Blockchain
Venue |
Title |
Link / Summary |
Brief |
Bitcoin white paper |
Bitcoin: A Peer-to-Peer Electronic Cash System |
Summary |
Bitcoin white paper |
Ethereum yellow paper |
Ethereum: A secure decentralised generalised transaction ledger |
Link |
Ethereum yellow paper |
Github Repo |
Self-maintained blockchain paper list |
Repo 1, Repo 2 |
- |
Tutorial |
Blockchain tutorial from Liao Xuefeng |
Link |
- |
FAST'24 |
COLE: A Column-based Learned Storage for Blockchain Systems |
Summary |
COLE |
SIGMOD'24 |
ChainKV: A Semantics-Aware Key-Value Store for Ethereum System |
Link |
ChainKV |
Frontiers of CS |
Dynamic-EC: an efficient dynamic erasure coding method for permissioned blockchain systems |
Summary |
Dynamic-EC |
HPCA'24 |
Rapper: A Parameter-Aware Repair-in-Memory Accelerator for Blockchain Storage Platform |
Link |
Blockchain, EC |
ACM Computing Survey'24 |
Scaling Blockchains with Error Correction Codes: A Survey on Coded Blockchains |
Link |
Blockchain, coding |
TC'24 |
BFT-DSN: A Byzantine Fault-Tolerant Decentralized Storage Network |
Link |
BFT, EC |
IOTJ'24 |
TORR: A Lightweight Blockchain for Decentralized Federated Learning |
Link |
Blockchain, EC, AI |
TKDE'23 |
PartitionChain: A Scalable and Reliable Data Storage Strategy for Permissioned Blockchain |
Summary |
PartitionChain |
TC'23 |
Efficient Integrity Auditing Mechanism With Secure Deduplication for Blockchain Storage |
Link |
Blockchain, security, deduplication |
ICPADS'23 |
DW-LRC: A Dynamic Wide-stripe LRC Codes for Blockchain Data Under Malicious Node Scenarios |
Link |
Blockchain, EC, LRC |
IOTJ'23 |
On Min–Max Storage for Resource-Restricted Clients in Coded Blockchain Systems |
Link |
Blockchain, coding |
TDSC'22 |
Enabling Secure and Efficient Decentralized Storage Auditing With Blockchain |
Link |
Blockchain, security, coding |
ISIT'22 |
Polar Coded Merkle Tree: Improved Detection of Data Availability Attacks in Blockchain Systems |
Link |
Blockchain, Merkle tree, Coding |
IOTJ'22 |
Proof of Continuous Work for Reliable Data Storage Over Permissionless Blockchain |
Link |
Permissionless blockchain, EC |
COMNET'22 |
Speeding up block propagation in Bitcoin network: Uncoded and coded designs |
Link |
Bitcoin, coding |
TCOM'22 |
Overcoming Data Availability Attacks in Blockchain Systems: Short Code-Length LDPC Code Design for Coded Merkle Tree |
Link |
Blockchain, merkle tree, coding |
SmartWorld'22 |
A Lightweight Locally Repairable Code-based Storage Architecture for Blockchains |
Link |
Blockchain, coding, LRC |
WCNC'22 |
Secure and Private Fountain Code based Architecture for Blockchains |
Link |
Blokchain, coding |
IEEE S&P (Oakland)'21 |
Red Belly: A Secure, Fair and Scalable Open Blockchain |
Link |
Red Belly |
TIFS'21 |
PolyShard: Coded Sharding Achieves Linearly Scaling Efficiency and Security Simultaneously |
Link |
Polyshard, blockchain, sharding |
TON'21 |
Coding for Scalable Blockchains via Dynamic Distributed Storage |
Link |
Blockchain, EC |
TKDE'21 |
Distributed Error Correction Coding Scheme for Low Storage Blockchain Systems |
Link |
Erasure coding, blockchain |
ISIT'21 |
Low Latency Cross-Shard Transactions in Coded Blockchain |
Link |
Blockchain, coding, sharding |
ITW'21 |
Communication-Efficient LDPC Code Design for Data Availability Oracle in Side Blockchains |
Link |
Blockchain, coding |
ICDE'20 |
BFT-Store: Storage Partition for Permissioned Blockchain via Erasure Coding |
Summary |
BFT-Store |
SIGMOD'20 Demo |
A Byzantine Fault Tolerant Storage for Permissioned Blockchain |
Link |
Erasure coding, permissioned blockchain |
JPDC'20 |
Blockchain-based verification framework for data integrity in edge-cloud storage |
Link |
Blockchain, verification, coding |
ICDCS'20 |
Towards Privacy-assured and Lightweight On-chain Auditing of Decentralized Storage |
Link |
Blockchain, verification, auditing |
Blockchain'20 |
Secure Regenerating Codes for Reducing Storage and Bootstrap Costs in Sharded Blockchains |
Link |
Blockchain, EC, regenerating codes |
IOTJ'20 |
Distributed Error Correction Coding Scheme for Low Storage Blockchain Systems |
Link |
Erasure coding, blockchain |
AFT'19 |
SoK: Sharding on Blockchain |
Link |
Sharding |
CCS'18 |
RapidChain: Scaling Blockchain via Full Sharding |
Link |
RapidChain, blockchain, sharding |
iTings'18 |
Erasure code-based low storage blockchain node |
Link |
(highly cited reference) Erasure coding, blockchain |
TrustCom'18 |
A Blockchain-based Decentralized Data Storage and Access Framework for PingER |
Link |
Bitcoin, coding |
ICPADS'18 |
Blockchain Based Data Integrity Verification in P2P Cloud Storage |
Link |
Blockchain, verification, coding |
PODC'07 |
Verifying Distributed Erasure-Coded Data |
Link |
EC, verification |
DSN'04 |
Efficient Byzantine-tolerant erasure-coded storage |
Link |
BFT, erasure coding |
Large Language Models
LLM Serving
Venue |
Title |
Link / Summary |
Brief |
OSDI'24 |
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models |
Link |
LLM, serverless |
OSDI'24 |
Fairness in Serving Large Language Models |
Link |
LLM, Fairness |
Database
Indexing
Venue |
Title |
Link / Summary |
Brief |
OSDI'21 |
FoundationDB |
Summary |
FoundationDB, Apple |
SIGMOD'18 |
The Case for Learned Index Structures |
Link |
Learned Index |
Security
Venue |
Title |
Link / Summary |
Brief |
Systor'18 |
How to Best Share a Big Secret |
Link |
Secret sharing |
Communications of the ACM'1979 |
How to Share a Secret |
Link |
Secret sharing |
Edge
Venue |
Title |
Link / Summary |
Brief |
HotEdge'20 |
Sharing and Caring of Data at the Edge |
Summary |
Edge storage survey (including a list of papers, must read) |
JPDC'20 |
EdgeKV: Decentralized, scalable, and consistent storage for the edge |
Summary |
EdgeKV |
SEC (Symposium on Edge Computing) Paper List
Venue |
Title |
Link / Summary |
Brief |
SEC’17 |
EdgeCourier: An Edge-hosted Personal Service for Low-bandwidth Document Synchronization in Mobile Cloud Storage Services |
--- |
--- |
SEC’17 |
CloudPath: A Multi-Tier Cloud Computing Framework |
--- |
--- |
SEC’17 |
LAVEA: Latency-aware Video Analytics on Edge Computing Platform |
--- |
--- |
SEC’17 |
Fast Transparent Virtual Machine Migration in Distributed Edge Clouds |
--- |
--- |
SEC’17 |
A Vehicle-based Edge Computing Platform for Transit and Human Mobility Analytics |
--- |
--- |
SEC’18 |
VideoEdge: Processing Camera Streams using Hierarchical Clusters |
--- |
--- |
SEC’18 |
Extend Cloud to Edge with KubeEdge |
--- |
--- |
SEC’19 |
Sandpaper: mitigating performance interference in CDN edge proxies |
--- |
--- |
SEC’19 |
Real-time traffic estimation at vehicular edge nodes |
--- |
--- |
SEC’19 |
Infrastructure fault detection and prediction in edge cloud environments |
--- |
--- |
SEC’19 |
Why cloud applications are not ready for the edge (yet) |
--- |
--- |
Deduplication
Venue |
Title |
Link / Summary |
Brief |
ATC'15 |
Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal |
Summary |
CDStore |
Consensus
Venue |
Title |
Link / Summary |
Brief |
ATC'14 |
In Search of an Understandable Consensus Algorithm |
Summary |
Raft |
OSDI'1999 |
Practical Byzantine Fault Tolerance |
Summary |
PBFT |
Stream Processing
Venue |
Title |
Link / Summary |
Brief |
ICDCS'20 |
Toward Adaptive Disk Failure Prediction via Stream Mining |
Summary |
StreamDFP |
Graph Processing
Venue |
Title |
Link / Summary |
Brief |
OSDI'16 |
Gemini: A Computation-Centric Distributed Graph Processing System |
Summary |
Gemini |
SIGMOD'19 |
Nanosecond Indexing of Graph Data With Hash Maps and VLists |
Summary |
Nanosecond |
Scheduling
Venue |
Title |
Link / Summary |
Brief |
SOSP'1973 |
Polynomial Complete Scheduling Problems |
Summary |
Scheduling proof |
Communications of ACM'1974 |
Scheduling independent tasks to reduce mean finishing time |
Summary |
Scheduling algorithms |
JACM'1976 |
Exact and Approximate Algorithms for Scheduling Nonidentical Processors |
Summary |
Scheduling algorithms |
JACM'1977 |
Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors |
Summary |
Performance analysis on scheduling heuristics |
MP'1990 |
Approximation Algorithms for Scheduling Unrelated Parallel Machines |
Summary |
Scheduling algorithms and proofs |
Graph Theory
Venue |
Title |
Link / Summary |
Brief |
JALG'06 |
Semi-matchings for bipartite graphs and load balancing |
Summary |
Semi-matching on unweighted bipartite |
IPL'06 |
An approximation algorithm for the load-balanced semi-matching problem in weighted bipartite graphs |
Summary |
Semi-matching for jobs with identical processing times |
IPL'09 |
A note on "An approximation algorithm for the load-balanced semi-matching problem in weighted bipartite graphs" |
Summary |
Corrections of bounds for IPL'06 |
IPSJ'07 |
Optimal Balanced Semi-Matchings for Weighted Bipartite Graphs |
Summary |
Optimal Semi-matching proof |
Networking
Software Defined Network (SDN)
Venue |
Title |
Link / Summary |
Brief |
Book |
Software-Defined-Networks: A Systems Approach |
Reading notes: Ch.1, Ch.2, Ch.3, Ch.4, Ch.5, Ch.6, Ch.7, Ch.8 |
SDN Book |
White paper |
Cisco SD-WAN white paper |
Link |
Cisco SD-WAN |
IEEE Communications Surveys & Tutorials'14 |
A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks |
Link |
SDN Survey |
ICCCN'21 |
Software-Defined Wide Area Network (SD-WAN): Architecture, Advances and Opportunities |
Link |
SD-WAN Survey |
SIGCOMM'18 |
B4: Experience with a Globally-Deployed Software Defined WAN |
Link |
B4 |
NSDI'14 |
Network Virtualization in Multi-tenant Datacenters |
Link |
Network Virtualization |
SIGCOMM'13 |
Achieving High Utilization with Software-Driven WAN |
Link (Not done) |
Software-Driven WAN |
SIGCOMM'08 |
OpenFlow: Enabling Innovation in Campus Networks |
Link (Not done) |
OpenFlow |
Network Measurement
Venue |
Title |
Link / Summary |
Brief |
SIGCOMM'18 |
SketchLearn: Relieving User Burdens in Approximate Measurement with Automated Statistical Inference |
Summary |
SketchLearn |
TOS (Transaction on Storage) Paper List
Erasure Coding (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'09 |
GRID codes: Strip-based erasure codes with high fault tolerance for storage systems |
Link |
--- |
TOS'12 |
Generalized X-code: An efficient RAID-6 code for arbitrary size of disk array |
Link |
--- |
TOS'13 |
Exploiting Redundancies and Deferred Writes to Conserve Energy in Erasure-Coded Storage Clusters |
Link |
--- |
TOS'13 |
Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems |
Link |
--- |
TOS'14 |
STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures |
Link |
--- |
TOS'14 |
Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems |
Link |
--- |
TOS'15 |
Low-Complexity Implementation of RAID Based on Reed-Solomon Codes |
Link |
--- |
TOS'17 |
High-Performance General Functional Regenerating Codes with Near-Optimal Repair Bandwidth |
Link |
--- |
TOS'17 |
Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice |
Link |
--- |
TOS'17 |
Systematic Erasure Codes with Optimal Repair Bandwidth and Storage |
Link |
--- |
TOS'20 |
On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes |
Link |
--- |
TOS'20 |
Fast Erasure Coding for Data Storage: A Comprehensive Study of the Acceleration Techniques |
Link |
--- |
TOS'20 |
PBS: An Efficient Erasure-Coded Block Storage System Based on Speculative Partial Writes |
Link |
--- |
RAID (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'05 |
Improving storage system availability with D-GRAID |
Link |
--- |
TOS'05 |
Reliability and security of RAID storage systems and D2D archives using SATA disk drives |
Link |
--- |
TOS'07 |
PARAID: A gear-shifting power-aware RAID |
Link |
--- |
TOS'08 |
A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors |
Link |
--- |
TOS'09 |
Higher reliability redundant disk arrays: Organization, operation, and coding |
Link |
--- |
TOS'10 |
Differential RAID: Rethinking RAID for SSD reliability |
Link |
--- |
TOS'11 |
A Hybrid Approach to Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation |
Link |
--- |
TOS'11 |
Minimum density RAID-6 codes |
Link |
--- |
TOS'11 |
Online availability upgrades for parity-based RAIDs through supplementary parity augmentations |
Link |
--- |
TOS'11 |
Reducing Repair Traffic in P2P Backup Systems: Exact Regenerating Codes on Hierarchical Codes |
Link |
--- |
TOS'11 |
Disk Scrubbing Versus Intradisk Redundancy for RAID Storage Systems |
Link |
--- |
TOS'14 |
Beyond MTTDL: A Closed-Form RAID 6 Reliability Equation |
Link |
--- |
TOS'15 |
RAIDShield: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures |
Link |
--- |
TOS'15 |
An Energy-Efficient and Reliable Storage Mechanism for Data-Intensive Academic Archive Systems |
Link |
--- |
TOS'15 |
Rebuttal to “Beyond MTTDL: A Closed-Form RAID-6 Reliability Equation” |
Link |
--- |
TOS'16 |
LoneStar RAID: Massive Array of Offline Disks for Archival Systems |
Link |
--- |
TOS'16 |
H-Scale: A Fast Approach to Scale Disk Arrays via Hybrid Stripe Deployment |
Link |
--- |
TOS'19 |
Determining Data Distribution for Large Disk Enclosures with 3-D Data Templates |
Link |
RAID+ |
Data Placement (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'14 |
Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems |
Link |
--- |
Flash-memory (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'18 |
An Analysis of Flash Page Reuse With WOM Codes |
Link |
--- |
Backup (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'12 |
Efficient cooperative backup with decentralized trust management |
Link |
--- |
Storage System (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'05 |
DISP: Practical, efficient, secure and fault-tolerant distributed data storage |
Link |
--- |
TOS'09 |
POTSHARDS—a secure, recoverable, long-term archival storage system |
Link |
--- |
TOS'11 |
PRESIDIO: A Framework for Efficient Archival Data Storage |
Link |
--- |
TOS'13 |
DepSky: Dependable and Secure Storage in a Cloud-of-Clouds |
Summary |
--- |
TOS'17 |
Hybris: Robust Hybrid Cloud Storage |
Summary |
--- |
TOS'17 |
Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to File-System Faults |
Link |
--- |
TOS'19 |
Liquid Cloud Storage |
Link |
--- |
TOS'20 |
The Case for Custom Storage Backends in Distributed Storage Systems |
Link |
--- |
KV-Store (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'17 |
Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication |
Link |
--- |
Benchmark (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'07 |
Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? |
Link |
--- |
TOS'08 |
A nine year study of file system and storage benchmarking |
Link |
--- |
Techniques (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'12 |
Efficient software implementations of large finite fields GF(2n) for secure storage applications |
Link |
--- |
TOS'16 |
Tools for Predicting the Reliability of Large-Scale Storage Systems |
Link |
--- |
File System (TOS)
Venue |
Title |
Link / Summary |
Brief |
TOS'14 |
A Study of Linux File System Evolution |
Link |
--- |
TOS'20 |
Everyone Loves File: Oracle File Storage Service |
Link |
--- |
|