Pelican: A Building Block for Exascale Cold Data Storage
This paper presents Pelican, a rack-scale prototype storage unit as a building block for exabyte-scale cold storage for the cloud. The design of Pelican, including hardware configurations and software stack are tailor-made for cold data workload. The contribution of this paper is (i) it presents right-provisioning of Pelican’s hardware configurations, allowing Pelican to have good performance but with disks partially active to reduce power consumption, (ii) it describes the Pelican software stack that provides good performance (low latency and high throughput) given hardware restrictions, by Pelican’s data layout algorithm and IO scheduling, (iii) it describes Pelican’s prototype very clearly, from hardware configuration to software stack deployment, and evaluates the performance with a rack-scale simulator using cross-validation.
Amazon Galcier, Facebook cold data storage SKU (proprietary at that time)
Modify disk IO handling to increase disk inactivity
Write-offloading to allow disk inactivity: Pergamum
Massive Arrays Of Idle Disks (MAID) systems: support peak performance
Power propotional systems: Rabbit, Sierra
It presents the hardware configurations tailor-made for cold data storage
Extra:
Hardware and software are tightly coupled, and not easily generalized
High reject rate when work rate gets higher, which seens not tolerable (Fig. 6)
significant reduction of performance with repair traffic, even though it’s concurrent(Fig. 7 (b))
How metadata service is maintained by catelog?