Caching: A Complete Guide to Cache Design, Policies, and Eviction Algorithms
A comprehensive guide to understanding caching, why it exists, how it works, cache eviction algorithms like LRU, LFU, FIFO, ARC, TinyLFU, and how modern systems use them.
Introduction
Caching is one of the most important optimization techniques in computer science. Whether you’re loading a website, querying a database, streaming a video, or using an operating system, caching is almost certainly improving your experience.
A well-designed cache can improve application performance by 10x to 1000x, reduce infrastructure costs, and lower latency significantly.
Without caching:
- Every request may hit the database.
- Expensive computations run repeatedly.
- Network traffic increases.
- Applications become slower as users grow.
Caching solves these problems by storing frequently accessed data in a faster storage layer.
What is Caching?
A cache is a temporary high-speed storage layer that stores copies of data so future requests can be served faster.
Instead of computing or retrieving data every time, the application first checks whether it already exists in the cache.
Client
|
v
Application
|
+------ Cache (Fast)
| |
| | Hit
| v
| Return Data
|
| Miss
v
Database
If the data exists:
Cache Hit
If not:
Cache Miss
The application retrieves data from the original source, stores it in the cache, and returns it.
Why Do We Need Caching?
Suppose fetching a user profile takes:
- Database lookup: 80 ms
- Network latency: 20 ms
Total:
100 ms
With caching:
Cache lookup: 2 ms
Performance improvement:
100 ms → 2 ms
50× faster.
Now imagine:
1 million requests/day
Without caching:
1,000,000 DB queries
With a 90% cache hit rate:
100,000 DB queries
That’s a massive reduction in database load.
How a Cache Works
Cache Hit
User
|
Request
|
v
Cache
|
Found
|
Return immediately
Very fast.
Cache Miss
User
|
Request
|
Cache
|
Not Found
|
Database
|
Return data
|
Store in Cache
|
Return to User
Future requests become hits.
Types of Caches
CPU Cache
Fast memory inside processors.
Levels:
- L1
- L2
- L3
Measured in nanoseconds.
Browser Cache
Stores:
- Images
- CSS
- JavaScript
- Fonts
Reduces page loading times.
CDN Cache
Stores static files near users.
Example:
India User
↓
Cloudflare Edge
↓
Origin Server
Application Cache
Examples:
- Redis
- Memcached
- In-memory HashMap
Stores:
- Sessions
- User profiles
- API responses
Database Cache
Databases also cache:
- Query plans
- Index pages
- Frequently accessed rows
Examples:
- PostgreSQL Buffer Cache
- MySQL Buffer Pool
Cache Terminology
Cache Hit
Requested item exists.
Latency:
2 ms
Cache Miss
Data absent.
Need backend lookup.
Latency:
100 ms
Hit Rate
Formula:
Hit Rate =
Hits / Total Requests
Example:
900 hits
100 misses
Total = 1000
Hit Rate = 90%
Miss Rate
Miss Rate = 1 − Hit Rate
Cache Capacity
Maximum items stored.
Example:
1000 users
After capacity is full:
Old items must be removed.
This is where eviction algorithms become important.
Cache Eviction Policies
When the cache becomes full, one item must be removed.
Question:
Which one?
Different algorithms answer this differently.
LRU (Least Recently Used)
Idea
Remove the item that has not been used for the longest time.
Recent accesses indicate future probability.
This works because many workloads exhibit temporal locality.
Example
Capacity = 3
Access:
A
B
C
Cache:
A B C
Now access:
A
Order:
B C A
Now insert:
D
Remove:
B
Result:
C A D
Visualization
Least Recent Most Recent
B → C → A
Insert D
Remove B
C → A → D
Time Complexity
Using:
- HashMap
- Doubly Linked List
Operations:
Get
O(1)
Put
O(1)
Why HashMap + Linked List?
HashMap:
Fast lookup
Linked List:
Maintain usage order
Together:
O(1)
Advantages
- Simple
- Predictable
- Excellent for web workloads
- Widely used
Disadvantages
Sequential scans can evict useful items.
Example:
Read
1
2
3
4
5
...
100000
Everything gets replaced.
LFU (Least Frequently Used)
Idea
Remove the least frequently accessed item.
Instead of recency:
Track frequency.
Example
A = accessed 100 times
B = accessed 2 times
C = accessed 1 time
Remove:
C
Advantages
Excellent when popularity remains stable.
Disadvantages
Old frequently-used items may never leave.
Needs aging mechanisms.
More complicated than LRU.
FIFO (First In First Out)
Remove the oldest inserted item.
Order:
Insert
A
B
C
Insert D
Remove:
A
No regard for usage.
Advantages
- Extremely simple
Disadvantages
- Poor hit rates
MRU (Most Recently Used)
Remove the newest item.
Opposite of LRU.
Useful when recently accessed data is unlikely to be reused.
Rare.
Random Replacement
Randomly evict one item.
Advantages:
- Very cheap
- No metadata
Disadvantages:
- Unpredictable
Surprisingly competitive in some workloads.
Clock Algorithm
Approximation of LRU.
Each page has:
Reference bit
A rotating pointer scans pages.
If reference bit:
1
↓
Reset
↓
Skip
If:
0
↓
Evict
Commonly used in operating systems.
ARC (Adaptive Replacement Cache)
ARC maintains:
- Recent items
- Frequent items
It dynamically adjusts between them.
Unlike LRU, ARC adapts automatically.
Benefits:
- Handles scans
- Handles frequently accessed objects
- Better hit rates
Disadvantages:
- More complex
- Patent concerns historically limited adoption
TinyLFU
Modern high-performance caches often use TinyLFU.
Idea:
Instead of only deciding what to evict, TinyLFU also decides what should enter the cache.
It estimates popularity using probabilistic data structures.
Benefits:
- Very high hit rate
- Low memory usage
- Excellent for large systems
Used by:
- Caffeine (Java)
- Modern caching libraries
W-TinyLFU
Window TinyLFU combines:
- Small LRU window
- TinyLFU admission
- Segmented cache
This provides excellent performance across many workloads.
It is considered one of the best practical cache algorithms today.
Comparing Algorithms
| Algorithm | Tracks | Complexity | Strength | Weakness |
|---|---|---|---|---|
| LRU | Recency | O(1) | Simple and fast | Poor for scans |
| LFU | Frequency | O(1)-O(log n) | Stable popularity | Needs aging |
| FIFO | Arrival | O(1) | Very simple | Ignores usage |
| MRU | Recent usage | O(1) | Good for scans | Rare workloads |
| Random | Nothing | O(1) | Minimal overhead | Lower hit rates |
| Clock | Approximate recency | O(1) | OS-friendly | Approximation |
| ARC | Recent + Frequent | O(1) | Adaptive | Complex |
| TinyLFU | Frequency estimation | O(1) | Excellent hit rates | More sophisticated |
Cache Write Policies
Caching isn’t only about reads. Writes also matter.
Write Through
Write Cache
↓
Write Database
Pros:
- Always consistent
Cons:
- Slower writes
Write Back
Write Cache
↓
Return Immediately
↓
Database updated later
Pros:
- Fast
Cons:
- Risk of data loss
Write Around
Writes bypass cache.
Application
↓
Database
↓
Cache unchanged
Useful for infrequently accessed data.
Cache Invalidation
One of the hardest problems in software engineering.
Suppose:
Cache:
User Age = 25
Database updated:
Age = 26
Cache still says:
25
Incorrect.
Solutions:
- TTL
- Explicit invalidation
- Versioning
- Event-driven updates
Time-To-Live (TTL)
Each cached object expires after a fixed duration.
Example:
TTL = 5 minutes
After expiration:
Cache Miss
Fresh data is loaded.
Distributed Caching
Large applications cannot rely on one cache.
Instead:
Load Balancer
/ | \
Server1 Server2 Server3
|
Redis Cluster
Benefits:
- Shared cache
- Consistent state
- Horizontal scaling
Real World Examples
Redis
Uses configurable eviction policies:
- allkeys-lru
- volatile-lru
- allkeys-lfu
- volatile-lfu
- random
- ttl
Memcached
Historically used LRU.
Optimized for simplicity.
Caffeine
Java’s modern cache library.
Uses:
W-TinyLFU
One of the highest-performing cache implementations available.
CPU Cache
Uses hardware replacement policies inspired by:
- LRU
- Pseudo-LRU
- Random
Exact implementation depends on CPU architecture.
Implementing an LRU Cache
The standard implementation combines:
HashMap
+
Doubly Linked List
HashMap:
Key → Node
Linked List:
Most Recent
↓
Head
...
Tail
↑
Least Recent
Operations:
Get
- Lookup in HashMap.
- Move node to front.
- Return value.
Put
If key exists:
- Update value.
- Move to front.
Otherwise:
- Insert new node.
- If capacity exceeded:
- Remove tail.
- Delete from HashMap.
Every operation remains O(1).
Choosing the Right Algorithm
| Scenario | Best Choice |
|---|---|
| General web applications | LRU |
| Stable access patterns | LFU |
| Operating systems | Clock |
| High-performance Java services | W-TinyLFU |
| Database buffer pools | LRU / Clock |
| Embedded systems | FIFO |
| Research systems | ARC |
Best Practices
- Choose an appropriate cache size based on workload.
- Measure cache hit rate and miss rate continuously.
- Set sensible TTL values to balance freshness and performance.
- Avoid caching highly volatile data unless invalidation is reliable.
- Use serialization formats that are compact and fast.
- Prevent cache stampedes using techniques like request coalescing or locking.
- Warm the cache for frequently accessed data after deployments.
- Monitor eviction counts, latency, and memory usage.
- Consider distributed caches for horizontally scaled applications.
- Select an eviction policy based on observed access patterns rather than assumptions.
Conclusion
Caching is one of the most impactful optimization techniques in modern software systems. By keeping frequently accessed data in a fast storage layer, applications reduce latency, lower database load, and improve scalability.
No single eviction algorithm is ideal for every workload:
- LRU is simple, efficient, and works well for most applications due to temporal locality.
- LFU excels when frequently accessed data remains popular over long periods.
- FIFO and Random Replacement are lightweight but generally provide lower hit rates.
- Clock offers an efficient approximation of LRU and is widely used in operating systems.
- ARC adapts between recency and frequency, making it effective across varied workloads.
- TinyLFU and W-TinyLFU represent the state of the art for many modern caching libraries, combining admission control with efficient eviction to maximize hit rates.
Ultimately, effective caching is about understanding your workload, selecting an appropriate eviction strategy, monitoring performance metrics such as hit rate and latency, and continuously tuning the system. A well-designed cache can dramatically improve user experience while reducing infrastructure costs and increasing application scalability.