Building Raft Consensus Algorithm in Go: Complete Implementation Guide for Distributed Systems

#programming #devto #go #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Let me walk you through how we can make several computers agree on the same thing, even when some of them stop working or the network gets messy. This is the problem of distributed consensus. I want to explain one way to solve it, called Raft, and show you how to build it using Go. Think of it as creating a reliable team where everyone follows the same playbook, no matter what happens.

We'll start from the very beginning. You have a group of machines—let's call them nodes—that need to share a common state. Maybe it's a key-value store, or the configuration for a service. If one node says "set X to 5," all the other nodes need to agree that X is indeed 5, and in the same order as any other commands. If they don't, your data becomes a mess. Raft gives us a clear set of rules to manage this team.

The algorithm organizes time into terms. Think of a term like a leader's time in office. Each term has a single leader. The leader's main job is to manage the log, which is the sequence of commands everyone must agree on. All changes go through the leader. The other nodes are followers. They accept instructions from the leader. This structure makes everything orderly.

Here’s how a node begins. We set up its identity, where it listens for messages, and who its peers are.

config := &NodeConfig{
    NodeID:            "node-1",
    ListenAddr:        "localhost:7000",
    Peers:             []string{"node-2:7001", "node-3:7002"},
    ElectionTimeout:   150 * time.Millisecond,
    HeartbeatTimeout:  50 * time.Millisecond,
}
node, err := NewRaftNode(config)

Every node starts its life as a follower. It's in a waiting state, listening for a heartbeat from a leader. This heartbeat is a simple message that says, "I'm here, I'm in charge." If a follower doesn't hear from a leader for a while—a period called the election timeout—it assumes the leader is gone. It then promotes itself to a candidate and starts an election.

The candidate votes for itself and asks all other nodes for their votes. It sends out a request.

func (rn *RaftNode) requestVote(peer string, wg *sync.WaitGroup, votesCh chan<- bool) {
    rpc := &RequestVoteRequest{
        Term:         rn.state.currentTerm,
        CandidateID:  rn.nodeID,
        LastLogIndex: rn.store.lastIndex,
        LastLogTerm:  rn.store.lastTerm,
    }
    // Send the RPC to the peer...
}

For a node to become a leader, it needs votes from a majority of the cluster. This majority rule is crucial. It ensures that even if the network splits, only one side can possibly achieve a majority and elect a leader. This prevents two leaders from acting at the same time. Once a candidate gets that majority, it becomes the leader.

Now, let's look at what a leader does. Its primary responsibility is log replication. When a client sends a command—like "set X to 5"—the leader does not execute it immediately. First, it adds the command to its own log. This log is just an ordered list.

func (rn *RaftNode) Propose(data []byte) (uint64, error) {
    entry := &LogEntry{
        Term:  rn.state.currentTerm,
        Index: rn.store.lastIndex + 1,
        Type:  LogCommand,
        Data:  data,
    }
    // Append to local log first
    rn.store.logs[entry.Index] = entry
    rn.store.lastIndex = entry.Index
    // Then tell the followers...
}

The leader then tells every follower about this new log entry. It sends an AppendEntries message. This message doesn't just contain the new command. It's clever. It says, "Here's a new entry. But before you accept it, check that your log matches mine up to the previous entry." This check guarantees consistency.

type AppendEntriesRequest struct {
    Term         uint64
    LeaderID     string
    PrevLogIndex uint64 // "What is your last entry's index?"
    PrevLogTerm  uint64 // "What is the term of that entry?"
    Entries      []*LogEntry // The new commands
    LeaderCommit uint64 // How far I've committed
}

A follower receives this and runs the check. If its log at PrevLogIndex has the same PrevLogTerm, it knows its history aligns with the leader's. It then appends the new entries. If the check fails, it tells the leader "no." The leader then tries again with an earlier log index, essentially stepping back in time to find where their logs diverge, and repairs the follower's log. This ensures all logs become identical.

A leader doesn't just send messages once. It must persistently send heartbeats, which are essentially empty AppendEntries messages, to maintain its authority. If it stops, followers will start a new election.

func (rn *RaftNode) heartbeatLoop() {
    ticker := time.NewTicker(rn.config.HeartbeatTimeout / 2)
    for {
        <-ticker.C
        if rn.state.role == Leader {
            rn.broadcastHeartbeat() // Sends empty AppendEntries
        }
    }
}

When is a command finally safe to execute? Not when the leader writes it, but when it's replicated. The leader waits until a new log entry is stored on a majority of nodes. At that point, it considers the entry committed. It then applies the command to its own state machine (like updating that key-value store) and notifies followers in subsequent heartbeats that they can also apply it.

func (rn *RaftNode) updateCommitIndex() {
    // Collect the latest index known to be replicated on each follower
    // Find the median value.
    if newCommitIndex > rn.store.committed {
        rn.store.committed = newCommitIndex // This entry is now safe
    }
}

This commit process is what gives us strong consistency. Every node applies the exact same commands in the exact same order. If you read from any node that has applied a committed command, you'll see its result.

Now, logs can't grow forever. They would fill the disk. Raft solves this with snapshots. Periodically, a node will take a snapshot of its entire state machine—a compact picture of what the system looks like at a specific log index. After saving that snapshot, it can discard all the old log entries that led up to that point.

func (rn *RaftNode) createSnapshot() {
    snapshot := &Snapshot{
        LastIndex: rn.store.lastIndex,
        LastTerm:  rn.store.lastTerm,
        Data:      rn.captureState(), // Serialize the entire application state
    }
    // Save it...
    rn.compactLog(snapshot.LastIndex) // Delete old log entries
}

This is a huge space saver. If a follower falls far behind—maybe it was offline—the leader can simply send it a snapshot to catch it up quickly, instead of replaying thousands of old log entries.

Sometimes, you need to change the team itself. You might want to add a new node for more capacity or remove a faulty one. Changing cluster membership is tricky. You can't just update the list on one node, or you risk causing a split where two different groups think they are the majority.

Raft uses a two-phase approach for safety. First, the leader replicates a special configuration entry that includes both the old set of servers and the new set (this is called a joint consensus). The system must commit this transitional configuration. Only after that is done, the leader can replicate the final configuration entry with just the new server set.

entry := &LogEntry{
    Type: LogConfiguration,
    Data: encodedNewConfig, // This data would describe the server change
}

This careful process ensures there is never a moment where two different majorities could be formed, which would allow for two leaders.

Let's talk about what the client sees. A client sends all write requests to the leader. If it sends a request to a follower, the follower politely redirects it. The client might have to retry if an election is happening. For reads, you could go to the leader every time, but that's inefficient. An optimization is leader leases: the leader can serve read requests directly if it knows it hasn't lost contact with a majority recently, without writing to the log. This provides low-latency reads while still being linearizable.

In my code, communication between nodes uses RPCs over TCP. I use connection pools to avoid setting up a new connection for every message. I also batch log entries. Instead of sending one entry per RPC, I can send dozens in a single message, which greatly improves throughput when the system is busy.

// In sendAppendEntries, the Entries slice can have many items
rpc := &AppendEntriesRequest{
    Entries: pendingEntriesBatch, // Send many at once
}

What happens when things go wrong? Nodes crash. The system is designed to tolerate failures of a minority of nodes. If the leader crashes, an election happens and a new leader takes over. When the crashed node restarts, it loads its most recent snapshot and asks the new leader for all the log entries it missed. Network partitions are more interesting. If a leader gets cut off from a majority of nodes, it can't commit any new entries. It's effectively stuck. Meanwhile, on the other side of the partition, the nodes that still form a majority will elect a new leader and continue operating. When the partition heals, the old leader will see a higher term in messages from the new leader and revert to being a follower, updating its log to match.

When you run this in production, you need to watch it. Key metrics are leader heartbeats, election counts, and commit latency. A sudden spike in elections might mean your network is unstable or timeouts are too short. You want to log important state changes and RPC errors for debugging.

Building this in Go feels natural. Goroutines are perfect for managing the different concurrent tasks: one for the state machine, one for the leader's heartbeat loop, one for taking snapshots. Channels help manage communication between these tasks and the RPC handlers. The standard library's sync primitives like RWMutex protect the node's internal state.

func (rn *RaftNode) runStateMachine() {
    for {
        // Wait for committed entries, then apply them
        rn.applyCommitted()
        time.Sleep(10 * time.Millisecond)
    }
}

The end result is a system you can trust. It guarantees that once a command is reported as committed, it will survive any number of failures short of the complete destruction of a majority of machines. It provides a solid foundation for building databases, coordination services, or any system where consistency is non-negotiable.

Writing this code taught me that the beauty of Raft isn't in clever tricks, but in its methodical, decomposable design. Each part—election, replication, snapshotting—has a clear job. By implementing each rule faithfully, you get a remarkably robust piece of infrastructure. It turns the chaotic problem of distributed agreement into a series of manageable, logical steps.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!