Add reader/writer locks to cache stripe for reduced contention #12601

bryancall · 2025-10-22T23:21:26Z

This commit implements reader/writer locks for cache directory operations to significantly reduce lock contention under high concurrency.

Changes:

Added ts::shared_mutex dir_mutex to StripeSM for directory operations
Created CacheDirSharedLock and CacheDirExclusiveLock RAII wrappers
Converted critical Cache.cc read paths to use shared locks for directory.probe()
Multiple readers can now access directory concurrently

Performance Impact:

Throughput: 17,520 req/s -> 44,218 req/s (+152%, 2.5x improvement)
Mean latency: 55.94ms -> 22.23ms (-60%, 2.5x faster)
Cache lock overhead: 42.81ms -> 9.10ms (-79%)

Test configuration: 1M requests, 1K concurrent clients, non-cacheable origin

This is a partial implementation covering Cache.cc read paths. Further optimization possible by converting CacheRead.cc and CacheWrite.cc.

Files modified:

src/iocore/cache/StripeSM.h: Added dir_mutex member
src/iocore/cache/P_CacheInternal.h: Added lock wrapper classes
src/iocore/cache/Cache.cc: Converted 3 critical paths to shared locks

Documentation:

CACHE_RWLOCK_ANALYSIS.md: Design analysis and implementation strategy
CACHE_RWLOCK_BENCHMARK_RESULTS.md: Detailed benchmark results and analysis

This commit implements reader/writer locks for cache directory operations to significantly reduce lock contention under high concurrency. Changes: - Added ts::shared_mutex dir_mutex to StripeSM for directory operations - Created CacheDirSharedLock and CacheDirExclusiveLock RAII wrappers - Converted critical Cache.cc read paths to use shared locks for directory.probe() - Multiple readers can now access directory concurrently Performance Impact: - Throughput: 17,520 req/s -> 44,218 req/s (+152%, 2.5x improvement) - Mean latency: 55.94ms -> 22.23ms (-60%, 2.5x faster) - Cache lock overhead: 42.81ms -> 9.10ms (-79%) Test configuration: 1M requests, 1K concurrent clients, non-cacheable origin This is a partial implementation covering Cache.cc read paths. Further optimization possible by converting CacheRead.cc and CacheWrite.cc. Files modified: - src/iocore/cache/StripeSM.h: Added dir_mutex member - src/iocore/cache/P_CacheInternal.h: Added lock wrapper classes - src/iocore/cache/Cache.cc: Converted 3 critical paths to shared locks Documentation: - CACHE_RWLOCK_ANALYSIS.md: Design analysis and implementation strategy - CACHE_RWLOCK_BENCHMARK_RESULTS.md: Detailed benchmark results and analysis

masaori335 · 2025-10-22T23:45:54Z

CACHE_RWLOCK_ANALYSIS.md

+
+#### Read-Only Operations (can use shared locks):
+```
+Cache.cc:345          - stripe->directory.probe()    [cache lookup]


I tried this approach before and faced a problem. If I understand correctly, Directory::probe needs the write lock when it found invalid Dir and call dir_delete_entry. This is why I'm waiting RCU/Hazard Pointer.

trafficserver/src/iocore/cache/CacheDir.cc

Lines 522 to 527 in 0411aa7

} else { // delete the invalid entry

ts::Metrics::Gauge::decrement(cache_rsb.direntries_used);

ts::Metrics::Gauge::decrement(stripe->cache_vol->vol_rsb.direntries_used);

e = dir_delete_entry(e, p, s, this);

continue;

}

masaori335 · 2025-10-22T23:54:55Z

src/iocore/cache/Cache.cc

-    if (!lock.is_locked() || (od = stripe->open_read(key)) || stripe->directory.probe(key, stripe, &result, &last_collision)) {
+    CACHE_DIR_TRY_LOCK_SHARED(dir_lock, stripe->dir_mutex);
+    if (!lock.is_locked() || !dir_lock.is_locked() || (od = stripe->open_read(key)) ||
+        stripe->directory.probe(key, stripe, &result, &last_collision)) {


Having mutex and dir_mutex might be good approach. When we call Directory::probe with read lock of dir_mutex, but it needs write operation (it's rare case I assume), we can wait for mutex lock. Obviously, we have to be careful for dead lock.

masaori335 · 2025-10-22T23:58:26Z

CACHE_RWLOCK_ANALYSIS.md

+   - High-performance BRAVO algorithm implementation
+   - Optimized fast-path for readers (lock-free in common case)
+   - Prevents writer starvation with adaptive policy
+   - More complex but potentially much faster


This lock contention problem of Stripe Mutex was main motivation of introducing BRAVO.

bryancall self-assigned this Oct 22, 2025

bryancall force-pushed the cache-rwlock-optimization branch from 8b652df to aa99c83 Compare October 22, 2025 23:28

ezelkow1 added the Cache label Oct 22, 2025

masaori335 reviewed Oct 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add reader/writer locks to cache stripe for reduced contention #12601

Add reader/writer locks to cache stripe for reduced contention #12601

Uh oh!

bryancall commented Oct 22, 2025

Uh oh!

masaori335 Oct 22, 2025 •

edited

Loading

Uh oh!

masaori335 Oct 22, 2025 •

edited

Loading

Uh oh!

masaori335 Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	} else { // delete the invalid entry
	ts::Metrics::Gauge::decrement(cache_rsb.direntries_used);
	ts::Metrics::Gauge::decrement(stripe->cache_vol->vol_rsb.direntries_used);
	e = dir_delete_entry(e, p, s, this);
	continue;
	}

Add reader/writer locks to cache stripe for reduced contention #12601

Are you sure you want to change the base?

Add reader/writer locks to cache stripe for reduced contention #12601

Uh oh!

Conversation

bryancall commented Oct 22, 2025

Uh oh!

masaori335 Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

masaori335 Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

masaori335 Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

masaori335 Oct 22, 2025 •

edited

Loading

masaori335 Oct 22, 2025 •

edited

Loading