Keyun Cheng

SCFS: A Shared Cloud-backed File System

Download

ATC, 2014

Summary

This paper presents a multi-cloud-backed file system SCFS. The main idea of this work is to provide confidentiality, integrity and availability of data and file services without complete trust of individual cloud providers, and without executing specific codes on storage clouds. SCFS provides strong consistency on file accesses and also provide a backplane to support multiple cloud interfaces. It adopts classical ideas of file system / cloud storage system, including consistency-on-close sematics, metadata and data seperation. SCFS follows some design principles like pay-per-ownership, versioning. SCFS seperates the metadata and lock management coordinator, instead of embedding into the system itself. The client-side data and metadata caching boosts the read/write performance significantly.

Details

System Architecture

system-architecture

  1. Implementation
    • SCFA-Agent: User-space file system based on FUSE-J (Java).
    • Coordination services: external (e.g. DepSpace, ZooKeeper) to manage metadata and locks
    • Storage Backend: Amazon S3 only (one single cloud), DepSky (multi-cloud)

Features

Evaluation

Strength

  1. it utilizes coordinators for metadata and locks consistency with r/w performance drop as the tradeoff

  2. local cache of data and metadata and non-blocking write feature improves client response speed

  3. Utilizes DepSky as the storage backend

  4. It tries to compare existing public file storage solutions

  5. It targets at small sized files (KBs to <= 10 MBs)

Weakness

  1. Common problems for client-side only deployment (refer to those in DepSky)

  2. The two layered cache design does not work for medium/large-sized files handling
    • It requires fiting the complete file to memory buffer, as partial read/write is not supported
  3. (engineering)Scheduling of non-blocking file-sync is not supported
    • The implementations assumes immediate writes to cloud after close
  4. metadata and lock management completely rely on the coordinator

  5. Lack of failure recovery (one possible solutions is by journaling and logging)