Introduction

Caching is one of the most important optimization techniques in computer science. Whether you’re loading a website, querying a database, streaming a video, or using an operating system, caching is almost certainly improving your experience.

A well-designed cache can improve application performance by 10x to 1000x, reduce infrastructure costs, and lower latency significantly.

Without caching:

Every request may hit the database.
Expensive computations run repeatedly.
Network traffic increases.
Applications become slower as users grow.

Caching solves these problems by storing frequently accessed data in a faster storage layer.

What is Caching?

A cache is a temporary high-speed storage layer that stores copies of data so future requests can be served faster.

Instead of computing or retrieving data every time, the application first checks whether it already exists in the cache.

Client
   |
   v
Application
   |
   +------ Cache (Fast)
   |         |
   |         | Hit
   |         v
   |      Return Data
   |
   | Miss
   v
Database

If the data exists:

Cache Hit

If not:

Cache Miss

The application retrieves data from the original source, stores it in the cache, and returns it.

Why Do We Need Caching?

Suppose fetching a user profile takes:

Database lookup: 80 ms
Network latency: 20 ms

Total:

100 ms

With caching:

Cache lookup: 2 ms

Performance improvement:

100 ms → 2 ms

50× faster.

Now imagine:

1 million requests/day

Without caching:

1,000,000 DB queries

With a 90% cache hit rate:

100,000 DB queries

That’s a massive reduction in database load.

How a Cache Works

Cache Hit

User
 |
Request
 |
 v
Cache
 |
Found
 |
Return immediately

Very fast.

Cache Miss

User
 |
Request
 |
Cache
 |
Not Found
 |
Database
 |
Return data
 |
Store in Cache
 |
Return to User

Future requests become hits.

Types of Caches

CPU Cache

Fast memory inside processors.

Levels:

Measured in nanoseconds.

Browser Cache

Stores:

Images
CSS
JavaScript
Fonts

Reduces page loading times.

CDN Cache

Stores static files near users.

Example:

India User

↓

Cloudflare Edge

↓

Origin Server

Application Cache

Examples:

Redis
Memcached
In-memory HashMap

Stores:

Sessions
User profiles
API responses

Database Cache

Databases also cache:

Query plans
Index pages
Frequently accessed rows

Examples:

PostgreSQL Buffer Cache
MySQL Buffer Pool

Cache Terminology

Cache Hit

Requested item exists.

Latency:

2 ms

Cache Miss

Data absent.

Need backend lookup.

Latency:

100 ms

Hit Rate

Formula:

Hit Rate =
Hits / Total Requests

Example:

900 hits

100 misses

Total = 1000

Hit Rate = 90%

Miss Rate

Miss Rate = 1 − Hit Rate

Cache Capacity

Maximum items stored.

Example:

1000 users

After capacity is full:

Old items must be removed.

This is where eviction algorithms become important.

Cache Eviction Policies

When the cache becomes full, one item must be removed.

Question:

Which one?

Different algorithms answer this differently.

LRU (Least Recently Used)

Idea

Remove the item that has not been used for the longest time.

Recent accesses indicate future probability.

This works because many workloads exhibit temporal locality.

Example

Capacity = 3

Access:

A
B
C

Cache:

A B C

Now access:

Order:

B C A

Now insert:

Remove:

Result:

C A D

Visualization

Least Recent      Most Recent

B → C → A

Insert D

Remove B

C → A → D

Time Complexity

Using:

HashMap
Doubly Linked List

Operations:

Get

O(1)

Put

O(1)

Why HashMap + Linked List?

HashMap:

Fast lookup

Linked List:

Maintain usage order

Together:

O(1)

Advantages

Simple
Predictable
Excellent for web workloads
Widely used

Disadvantages

Sequential scans can evict useful items.

Example:

Read

1
2
3
4
5
...
100000

Everything gets replaced.

LFU (Least Frequently Used)

Idea

Remove the least frequently accessed item.

Instead of recency:

Track frequency.

Example

A = accessed 100 times

B = accessed 2 times

C = accessed 1 time

Remove:

Advantages

Excellent when popularity remains stable.

Disadvantages

Old frequently-used items may never leave.

Needs aging mechanisms.

More complicated than LRU.

FIFO (First In First Out)

Remove the oldest inserted item.

Order:

Insert

A
B
C

Insert D

Remove:

No regard for usage.

Advantages

Extremely simple

Disadvantages

Poor hit rates

MRU (Most Recently Used)

Remove the newest item.

Opposite of LRU.

Useful when recently accessed data is unlikely to be reused.

Rare.

Random Replacement

Randomly evict one item.

Advantages:

Very cheap
No metadata

Disadvantages:

Unpredictable

Surprisingly competitive in some workloads.

Clock Algorithm

Approximation of LRU.

Each page has:

Reference bit

A rotating pointer scans pages.

If reference bit:

1

↓

Reset

↓

Skip

If:

0

↓

Evict

Commonly used in operating systems.

ARC (Adaptive Replacement Cache)

ARC maintains:

Recent items
Frequent items

It dynamically adjusts between them.

Unlike LRU, ARC adapts automatically.

Benefits:

Handles scans
Handles frequently accessed objects
Better hit rates

Disadvantages:

More complex
Patent concerns historically limited adoption

TinyLFU

Modern high-performance caches often use TinyLFU.

Idea:

Instead of only deciding what to evict, TinyLFU also decides what should enter the cache.

It estimates popularity using probabilistic data structures.

Benefits:

Very high hit rate
Low memory usage
Excellent for large systems

Used by:

Caffeine (Java)
Modern caching libraries

W-TinyLFU

Window TinyLFU combines:

Small LRU window
TinyLFU admission
Segmented cache

This provides excellent performance across many workloads.

It is considered one of the best practical cache algorithms today.

Comparing Algorithms

Algorithm	Tracks	Complexity	Strength	Weakness
LRU	Recency	O(1)	Simple and fast	Poor for scans
LFU	Frequency	O(1)-O(log n)	Stable popularity	Needs aging
FIFO	Arrival	O(1)	Very simple	Ignores usage
MRU	Recent usage	O(1)	Good for scans	Rare workloads
Random	Nothing	O(1)	Minimal overhead	Lower hit rates
Clock	Approximate recency	O(1)	OS-friendly	Approximation
ARC	Recent + Frequent	O(1)	Adaptive	Complex
TinyLFU	Frequency estimation	O(1)	Excellent hit rates	More sophisticated

Cache Write Policies

Caching isn’t only about reads. Writes also matter.

Write Through

Write Cache

↓

Write Database

Pros:

Always consistent

Cons:

Slower writes

Write Back

Write Cache

↓

Return Immediately

↓

Database updated later

Pros:

Fast

Cons:

Risk of data loss

Write Around

Writes bypass cache.

Application

↓

Database

↓

Cache unchanged

Useful for infrequently accessed data.

Cache Invalidation

One of the hardest problems in software engineering.

Suppose:

Cache:

User Age = 25

Database updated:

Age = 26

Cache still says:

Incorrect.

Solutions:

TTL
Explicit invalidation
Versioning
Event-driven updates

Time-To-Live (TTL)

Each cached object expires after a fixed duration.

Example:

TTL = 5 minutes

After expiration:

Cache Miss

Fresh data is loaded.

Distributed Caching

Large applications cannot rely on one cache.

Instead:

          Load Balancer

        /      |      \

Server1 Server2 Server3

      |

Redis Cluster

Benefits:

Shared cache
Consistent state
Horizontal scaling

Real World Examples

Redis

Uses configurable eviction policies:

allkeys-lru
volatile-lru
allkeys-lfu
volatile-lfu
random
ttl

Memcached

Historically used LRU.

Optimized for simplicity.

Caffeine

Java’s modern cache library.

Uses:

W-TinyLFU

One of the highest-performing cache implementations available.

CPU Cache

Uses hardware replacement policies inspired by:

LRU
Pseudo-LRU
Random

Exact implementation depends on CPU architecture.

Implementing an LRU Cache

The standard implementation combines:

HashMap

+

Doubly Linked List

HashMap:

Key → Node

Linked List:

Most Recent

↓

Head

...

Tail

↑

Least Recent

Operations:

Get

Lookup in HashMap.
Move node to front.
Return value.

Put

If key exists:

Update value.
Move to front.

Otherwise:

Insert new node.
If capacity exceeded:
- Remove tail.
- Delete from HashMap.

Every operation remains O(1).

Choosing the Right Algorithm

Scenario	Best Choice
General web applications	LRU
Stable access patterns	LFU
Operating systems	Clock
High-performance Java services	W-TinyLFU
Database buffer pools	LRU / Clock
Embedded systems	FIFO
Research systems	ARC

Best Practices

Choose an appropriate cache size based on workload.
Measure cache hit rate and miss rate continuously.
Set sensible TTL values to balance freshness and performance.
Avoid caching highly volatile data unless invalidation is reliable.
Use serialization formats that are compact and fast.
Prevent cache stampedes using techniques like request coalescing or locking.
Warm the cache for frequently accessed data after deployments.
Monitor eviction counts, latency, and memory usage.
Consider distributed caches for horizontally scaled applications.
Select an eviction policy based on observed access patterns rather than assumptions.

Conclusion

Caching is one of the most impactful optimization techniques in modern software systems. By keeping frequently accessed data in a fast storage layer, applications reduce latency, lower database load, and improve scalability.

No single eviction algorithm is ideal for every workload:

LRU is simple, efficient, and works well for most applications due to temporal locality.
LFU excels when frequently accessed data remains popular over long periods.
FIFO and Random Replacement are lightweight but generally provide lower hit rates.
Clock offers an efficient approximation of LRU and is widely used in operating systems.
ARC adapts between recency and frequency, making it effective across varied workloads.
TinyLFU and W-TinyLFU represent the state of the art for many modern caching libraries, combining admission control with efficient eviction to maximize hit rates.

Ultimately, effective caching is about understanding your workload, selecting an appropriate eviction strategy, monitoring performance metrics such as hit rate and latency, and continuously tuning the system. A well-designed cache can dramatically improve user experience while reducing infrastructure costs and increasing application scalability.