Skip to content
0xbenzo
Back to writing
9 min read

Caching: A Complete Guide to Cache Design, Policies, and Eviction Algorithms

A comprehensive guide to understanding caching, why it exists, how it works, cache eviction algorithms like LRU, LFU, FIFO, ARC, TinyLFU, and how modern systems use them.

Introduction

Caching is one of the most important optimization techniques in computer science. Whether you’re loading a website, querying a database, streaming a video, or using an operating system, caching is almost certainly improving your experience.

A well-designed cache can improve application performance by 10x to 1000x, reduce infrastructure costs, and lower latency significantly.

Without caching:

  • Every request may hit the database.
  • Expensive computations run repeatedly.
  • Network traffic increases.
  • Applications become slower as users grow.

Caching solves these problems by storing frequently accessed data in a faster storage layer.


What is Caching?

A cache is a temporary high-speed storage layer that stores copies of data so future requests can be served faster.

Instead of computing or retrieving data every time, the application first checks whether it already exists in the cache.

Client
   |
   v
Application
   |
   +------ Cache (Fast)
   |         |
   |         | Hit
   |         v
   |      Return Data
   |
   | Miss
   v
Database

If the data exists:

Cache Hit

If not:

Cache Miss

The application retrieves data from the original source, stores it in the cache, and returns it.


Why Do We Need Caching?

Suppose fetching a user profile takes:

  • Database lookup: 80 ms
  • Network latency: 20 ms

Total:

100 ms

With caching:

Cache lookup: 2 ms

Performance improvement:

100 ms → 2 ms

50× faster.

Now imagine:

1 million requests/day

Without caching:

1,000,000 DB queries

With a 90% cache hit rate:

100,000 DB queries

That’s a massive reduction in database load.


How a Cache Works

Cache Hit

User
 |
Request
 |
 v
Cache
 |
Found
 |
Return immediately

Very fast.


Cache Miss

User
 |
Request
 |
Cache
 |
Not Found
 |
Database
 |
Return data
 |
Store in Cache
 |
Return to User

Future requests become hits.


Types of Caches

CPU Cache

Fast memory inside processors.

Levels:

  • L1
  • L2
  • L3

Measured in nanoseconds.


Browser Cache

Stores:

  • Images
  • CSS
  • JavaScript
  • Fonts

Reduces page loading times.


CDN Cache

Stores static files near users.

Example:

India User



Cloudflare Edge



Origin Server

Application Cache

Examples:

  • Redis
  • Memcached
  • In-memory HashMap

Stores:

  • Sessions
  • User profiles
  • API responses

Database Cache

Databases also cache:

  • Query plans
  • Index pages
  • Frequently accessed rows

Examples:

  • PostgreSQL Buffer Cache
  • MySQL Buffer Pool

Cache Terminology

Cache Hit

Requested item exists.

Latency:

2 ms

Cache Miss

Data absent.

Need backend lookup.

Latency:

100 ms

Hit Rate

Formula:

Hit Rate =
Hits / Total Requests

Example:

900 hits

100 misses

Total = 1000

Hit Rate = 90%

Miss Rate

Miss Rate = 1 − Hit Rate

Cache Capacity

Maximum items stored.

Example:

1000 users

After capacity is full:

Old items must be removed.

This is where eviction algorithms become important.


Cache Eviction Policies

When the cache becomes full, one item must be removed.

Question:

Which one?

Different algorithms answer this differently.


LRU (Least Recently Used)

Idea

Remove the item that has not been used for the longest time.

Recent accesses indicate future probability.

This works because many workloads exhibit temporal locality.


Example

Capacity = 3

Access:

A
B
C

Cache:

A B C

Now access:

A

Order:

B C A

Now insert:

D

Remove:

B

Result:

C A D

Visualization

Least Recent      Most Recent

B → C → A

Insert D

Remove B

C → A → D

Time Complexity

Using:

  • HashMap
  • Doubly Linked List

Operations:

Get

O(1)

Put

O(1)

Why HashMap + Linked List?

HashMap:

Fast lookup

Linked List:

Maintain usage order

Together:

O(1)

Advantages

  • Simple
  • Predictable
  • Excellent for web workloads
  • Widely used

Disadvantages

Sequential scans can evict useful items.

Example:

Read

1
2
3
4
5
...
100000

Everything gets replaced.


LFU (Least Frequently Used)

Idea

Remove the least frequently accessed item.

Instead of recency:

Track frequency.


Example

A = accessed 100 times

B = accessed 2 times

C = accessed 1 time

Remove:

C

Advantages

Excellent when popularity remains stable.


Disadvantages

Old frequently-used items may never leave.

Needs aging mechanisms.

More complicated than LRU.


FIFO (First In First Out)

Remove the oldest inserted item.

Order:

Insert

A
B
C

Insert D

Remove:

A

No regard for usage.


Advantages

  • Extremely simple

Disadvantages

  • Poor hit rates

MRU (Most Recently Used)

Remove the newest item.

Opposite of LRU.

Useful when recently accessed data is unlikely to be reused.

Rare.


Random Replacement

Randomly evict one item.

Advantages:

  • Very cheap
  • No metadata

Disadvantages:

  • Unpredictable

Surprisingly competitive in some workloads.


Clock Algorithm

Approximation of LRU.

Each page has:

Reference bit

A rotating pointer scans pages.

If reference bit:

1



Reset



Skip

If:

0



Evict

Commonly used in operating systems.


ARC (Adaptive Replacement Cache)

ARC maintains:

  • Recent items
  • Frequent items

It dynamically adjusts between them.

Unlike LRU, ARC adapts automatically.

Benefits:

  • Handles scans
  • Handles frequently accessed objects
  • Better hit rates

Disadvantages:

  • More complex
  • Patent concerns historically limited adoption

TinyLFU

Modern high-performance caches often use TinyLFU.

Idea:

Instead of only deciding what to evict, TinyLFU also decides what should enter the cache.

It estimates popularity using probabilistic data structures.

Benefits:

  • Very high hit rate
  • Low memory usage
  • Excellent for large systems

Used by:

  • Caffeine (Java)
  • Modern caching libraries

W-TinyLFU

Window TinyLFU combines:

  • Small LRU window
  • TinyLFU admission
  • Segmented cache

This provides excellent performance across many workloads.

It is considered one of the best practical cache algorithms today.


Comparing Algorithms

Algorithm Tracks Complexity Strength Weakness
LRU Recency O(1) Simple and fast Poor for scans
LFU Frequency O(1)-O(log n) Stable popularity Needs aging
FIFO Arrival O(1) Very simple Ignores usage
MRU Recent usage O(1) Good for scans Rare workloads
Random Nothing O(1) Minimal overhead Lower hit rates
Clock Approximate recency O(1) OS-friendly Approximation
ARC Recent + Frequent O(1) Adaptive Complex
TinyLFU Frequency estimation O(1) Excellent hit rates More sophisticated

Cache Write Policies

Caching isn’t only about reads. Writes also matter.

Write Through

Write Cache



Write Database

Pros:

  • Always consistent

Cons:

  • Slower writes

Write Back

Write Cache



Return Immediately



Database updated later

Pros:

  • Fast

Cons:

  • Risk of data loss

Write Around

Writes bypass cache.

Application



Database



Cache unchanged

Useful for infrequently accessed data.


Cache Invalidation

One of the hardest problems in software engineering.

Suppose:

Cache:

User Age = 25

Database updated:

Age = 26

Cache still says:

25

Incorrect.

Solutions:

  • TTL
  • Explicit invalidation
  • Versioning
  • Event-driven updates

Time-To-Live (TTL)

Each cached object expires after a fixed duration.

Example:

TTL = 5 minutes

After expiration:

Cache Miss

Fresh data is loaded.


Distributed Caching

Large applications cannot rely on one cache.

Instead:

          Load Balancer

        /      |      \

Server1 Server2 Server3

      |

Redis Cluster

Benefits:

  • Shared cache
  • Consistent state
  • Horizontal scaling

Real World Examples

Redis

Uses configurable eviction policies:

  • allkeys-lru
  • volatile-lru
  • allkeys-lfu
  • volatile-lfu
  • random
  • ttl

Memcached

Historically used LRU.

Optimized for simplicity.


Caffeine

Java’s modern cache library.

Uses:

W-TinyLFU

One of the highest-performing cache implementations available.


CPU Cache

Uses hardware replacement policies inspired by:

  • LRU
  • Pseudo-LRU
  • Random

Exact implementation depends on CPU architecture.


Implementing an LRU Cache

The standard implementation combines:

HashMap

+

Doubly Linked List

HashMap:

Key → Node

Linked List:

Most Recent



Head

...

Tail



Least Recent

Operations:

Get

  1. Lookup in HashMap.
  2. Move node to front.
  3. Return value.

Put

If key exists:

  • Update value.
  • Move to front.

Otherwise:

  • Insert new node.
  • If capacity exceeded:
    • Remove tail.
    • Delete from HashMap.

Every operation remains O(1).


Choosing the Right Algorithm

Scenario Best Choice
General web applications LRU
Stable access patterns LFU
Operating systems Clock
High-performance Java services W-TinyLFU
Database buffer pools LRU / Clock
Embedded systems FIFO
Research systems ARC

Best Practices

  • Choose an appropriate cache size based on workload.
  • Measure cache hit rate and miss rate continuously.
  • Set sensible TTL values to balance freshness and performance.
  • Avoid caching highly volatile data unless invalidation is reliable.
  • Use serialization formats that are compact and fast.
  • Prevent cache stampedes using techniques like request coalescing or locking.
  • Warm the cache for frequently accessed data after deployments.
  • Monitor eviction counts, latency, and memory usage.
  • Consider distributed caches for horizontally scaled applications.
  • Select an eviction policy based on observed access patterns rather than assumptions.

Conclusion

Caching is one of the most impactful optimization techniques in modern software systems. By keeping frequently accessed data in a fast storage layer, applications reduce latency, lower database load, and improve scalability.

No single eviction algorithm is ideal for every workload:

  • LRU is simple, efficient, and works well for most applications due to temporal locality.
  • LFU excels when frequently accessed data remains popular over long periods.
  • FIFO and Random Replacement are lightweight but generally provide lower hit rates.
  • Clock offers an efficient approximation of LRU and is widely used in operating systems.
  • ARC adapts between recency and frequency, making it effective across varied workloads.
  • TinyLFU and W-TinyLFU represent the state of the art for many modern caching libraries, combining admission control with efficient eviction to maximize hit rates.

Ultimately, effective caching is about understanding your workload, selecting an appropriate eviction strategy, monitoring performance metrics such as hit rate and latency, and continuously tuning the system. A well-designed cache can dramatically improve user experience while reducing infrastructure costs and increasing application scalability.