Storage Latency Tradeoffs in Geographically-Distributed Storage Settings
Open Access
- Author:
- Balcik, Ethan
- Area of Honors:
- Computer Science
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Viveck Ramesh Cadambe, Thesis Supervisor
John Joseph Hannan, Thesis Honors Advisor
Bhuvan Urgaonkar, Thesis Honors Advisor - Keywords:
- erasure coding
graph theory
coding theory
information theory
chromatic number
latency
algorithm
computer science
mathematics
storagetopology
network
computer networking
data
object
linear coding
linear algebra
linear combination
average latency
worst-case latency
worst case latency
optimal
performance
storage capacity
node
edge
channel
capacity
storage - Abstract:
- Recent work in distributed data storage has demonstrated potential in the use of erasure coding for improved fault tolerance. Work by Cadambe and Lyu further indicates that partial erasure coding can be implemented in a distributed setting in a causally consistent manner by use of the CausalEC algorithm. This work intends to amend previous work by investigating the relationship between storage and latency when utilizing partial erasure coding in a distributed setting. It presents bounds on both the worst-case and average latency of a distributed data store as well as an algorithm which decides if lower-bound replicative performance can be achieved provided the network structure. This algorithm then motivates investigation into infinite subclasses of cases in which network structures may be oriented such that an erasure-coded storage scheme can outperform the optimal replicative storage scheme in terms of its worst-case and average latency. These cases are then shown to exist in practice within Amazon Web Services’ network of data centers.