ShardCache

I built this distributed in-memory cache in Go to mess around with consistent hashing and synchronous replication. It handles automatic sharding, failover, and has pluggable eviction policies (LRU/LFU) along with a token-bucket rate limiter.

How it Works

Consistent Hashing: The cluster uses a hash ring with 150 virtual nodes per server. Clients calculate the hash and route requests directly to the right node, which minimizes data shuffling when a server crashes or joins.
Sync Replication (RF=2): Every key gets 1 primary and 1 replica. Writes block until the replica successfully acknowledges the data. It costs write latency, but guarantees strong durability.
Auto-Failover: Nodes ping each other every second. Miss 3 pings? You're kicked off the local hash ring. When a primary dies, the replica naturally takes over because it sits next to it on the ring.
Pluggable Eviction: Built a clean interface so you can easily swap between LRU (Least Recently Used) and LFU (Least Frequently Used).

Spinning it Up

Just use Docker Compose:

docker-compose up --build

This boots up a 5-node cluster exposed on localhost ports 8081-8085.

Breaking Things (Failover Test)

I wrote an automated chaos test to prove the failover actually works. It spins up 5 local nodes, sets a key, completely kills the primary node responsible for that key, and verifies that the replica promotes itself and serves the data.

cd test
go test -v .

Benchmarks

To see how it performs under pressure, I wrote a benchmark tool that mimics real-world "hot keys" using a skewed Zipfian distribution.

# Start your servers in the background with rate-limiting disabled, then:
./bench_bin -peers localhost:8081,localhost:8082,localhost:8083,localhost:8084,localhost:8085 -ops 50000 -keys 5000

Results (50,000 ops, 50 parallel workers):

Policy	Throughput	Hit Ratio
LRU	~85,000 ops/sec	~83%
LFU	~84,500 ops/sec	~84%

Note: LFU edges out LRU on hit ratio here because, under heavy Zipfian skew, it strictly protects the absolute hottest keys, whereas LRU can get flushed by a random burst of cold reads. Throughput is basically identical because I implemented LFU in O(1) time using a map of doubly-linked lists.

The Edge Case: What if the primary dies mid-write?

Because I went with Synchronous Replication, a write looks like this:

Primary gets write.
Primary forwards it to the Replica.
Replica ACKs.
Primary applies it locally.
Primary ACKs the client.

The Race Conditions:

If the primary dies before step 2: The replica never gets it. The client gets a connection error. The write fails cleanly.
If the primary dies after step 3, but before step 5: The replica has the data, but the client connection drops. The client gets an error and thinks the write failed, even though it survived. (A false negative).

TL;DR: If the client receives a "Success" response, that data is definitively on two nodes. We trade a bit of write speed and accept potential false negatives to guarantee we never lose an acknowledged write.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bench		bench
cache		cache
client		client
cmd/server		cmd/server
failover		failover
node		node
pb		pb
ratelimiter		ratelimiter
ring		ring
test		test
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ShardCache

How it Works

Spinning it Up

Breaking Things (Failover Test)

Benchmarks

The Edge Case: What if the primary dies mid-write?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ShardCache

How it Works

Spinning it Up

Breaking Things (Failover Test)

Benchmarks

The Edge Case: What if the primary dies mid-write?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages