In Search of an Understandable Consensus Algorithm (RAFT)
This paper presents Raft, a simpler and more understandable consensus algorithm. Previously dominant Paxos protocol which is commonly taught has significant drawbacks: too difficult to understand and not able to provide a good foundation to build pratical implementations. Raft applied specific techniques to improve understandability including decomposition and state space reduction, and it has several novel features: strong leader, leader election and membership changes.
Exceptionally difficult to understand, few people succeed to understand it.
No widely agreed-upon algorithm for multi-Paxos, most for single-decree, but incomplete descriptions for multi-Paxos. Multi-Paxos relies on a complex single-decree decomposition, and it use a simple symmetric peer-to-pear approach as its core, no leadearship form for its optimization.
Recommend: a commic about Raft.
Problem decomposition
Simpify state spaces and eliminate nondeterministim
NOTE: Check the original paper for all examples and figures based on page limit.
Single distinguished leader is elected to manage the replicated log (accepts log from client and replicates to followers). Data flows between leader to followers.
Election: When leader fail, elect a new one.
Leader append-only
Log-matching to guarantee correctness
Leader Completeness
State Machine Safety
Followers: respond only
Candidate: used to elect a new leader
Leader: Replicate logs to followers
RPC: RequestVotes(Candidate), AppendEntries(Leader)
All server starts as followers
No communication from leader: election timeout, begin election.
A candidate wins with majority votes.
For the case that no candidate wins: Raft use randomized election timeouts to ensure that split votes are rare (the probability to happen for the second time) and resolved quickly.
Client request command
Leader append to its own log (new entry)
Leader issues AppendEntries to all Followers
Only if it’s safe (the majority of Followers respond and safely replicated), Leader apply the entry to state machine, return result to client
A lot of details in maintaining Leader and Followers log consistency (consistency check for missing entries).
Election restriction
Committing entries from previous terms
…
Adiditional to its completeness as a protocol with guarantee on safety, consistency, it’s
Understandable
More applicable for implementation
Single Leader configuration may be a bottleneck for large scale system, unable for load balance
Fixed way of communication
For the case that all candidates split votes, it may be a problem that the randomness causes latency (though finally one candidate will win the election).