Tuesday, May 11, 2010

Amazon's Dynamo distributed key-value store

Amazon's Dynamo [is] a highly available key-value storage system, [which] sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

You typically hear two rationales for NoSQL data stores. First, to selectively relax ACID properties to gain performance in distributed systems. The second is to support data models that fit poorly into tables with set operations, such as documents or objects. Although I'm more interested in the later, this paper is all about the former. Even so, it's an impressive piece of software engineering.

Dynamo is a distributed key-value store otherwise known as a distributed hash table, which is like a normal hash table except the buckets are on different nodes. It can store small objects on lots of servers and look them up by primary key. It's API is get(key, context) and put(key, context, object). The context is like a cookie. It's whatever the server wants to you remind it of in your next request. Nodes are assigned blocks of key-space using consistent hashing, ensuring that keys and/or load is evenly distributed among the nodes. One big motivation for a system like this is fault tolerance. Nodes can come and go and the system keeps working.

Redundant copies of data are kept on several nodes via replication. In RDBMS's, replication is typically done in ways that favor consistency over availability. This is one reason why RDBMS's don't scale out easily. Dynamo trades consistency for availability, especially for write operations using optimistic replication, meaning that replicas are guaranteed to converge only when the system has been quiesced for a period of time. Changes are allowed to propagate to replicas in the background, and concurrent, disconnected work is tolerated. Conflicting changes must be detected and resolved, later. You'll hear the terms lazy replication and eventual consistency thrown around with roughly the same meaning.

Dynamo resolves conflicts during reads and hands conflicting updates to the client application to resolve in an application-dependent way. Consistency is application defined anyway, right?

As a real-world analogy, writing paper checks is a distributed system which cannot guarantee 'consistency'. Conflict resolution happens in the clearing process. Also, distributed source control systems must face similar issues. How does GIT handle this sort of thing?

Strategies

  • Vector clocks - detect conflicting updates.
  • Sloppy Quorum and hinted handoff - handle failure and recovery.
  • Merkle hash trees - A tree in which parents are hashes of child nodes. They are a quick way to compute a diff, detecting divergence between nodes.
  • Gossip based protocol - distribute node membership and up/down status information.

Links