Skip to content

Conversation

@bryancall
Copy link
Contributor

This commit implements reader/writer locks for cache directory operations to significantly reduce lock contention under high concurrency.

Changes:

  • Added ts::shared_mutex dir_mutex to StripeSM for directory operations
  • Created CacheDirSharedLock and CacheDirExclusiveLock RAII wrappers
  • Converted critical Cache.cc read paths to use shared locks for directory.probe()
  • Multiple readers can now access directory concurrently

Performance Impact:

  • Throughput: 17,520 req/s -> 44,218 req/s (+152%, 2.5x improvement)
  • Mean latency: 55.94ms -> 22.23ms (-60%, 2.5x faster)
  • Cache lock overhead: 42.81ms -> 9.10ms (-79%)

Test configuration: 1M requests, 1K concurrent clients, non-cacheable origin

This is a partial implementation covering Cache.cc read paths. Further optimization possible by converting CacheRead.cc and CacheWrite.cc.

Files modified:

  • src/iocore/cache/StripeSM.h: Added dir_mutex member
  • src/iocore/cache/P_CacheInternal.h: Added lock wrapper classes
  • src/iocore/cache/Cache.cc: Converted 3 critical paths to shared locks

Documentation:

  • CACHE_RWLOCK_ANALYSIS.md: Design analysis and implementation strategy
  • CACHE_RWLOCK_BENCHMARK_RESULTS.md: Detailed benchmark results and analysis

@bryancall bryancall self-assigned this Oct 22, 2025
This commit implements reader/writer locks for cache directory operations
to significantly reduce lock contention under high concurrency.

Changes:
- Added ts::shared_mutex dir_mutex to StripeSM for directory operations
- Created CacheDirSharedLock and CacheDirExclusiveLock RAII wrappers
- Converted critical Cache.cc read paths to use shared locks for directory.probe()
- Multiple readers can now access directory concurrently

Performance Impact:
- Throughput: 17,520 req/s -> 44,218 req/s (+152%, 2.5x improvement)
- Mean latency: 55.94ms -> 22.23ms (-60%, 2.5x faster)
- Cache lock overhead: 42.81ms -> 9.10ms (-79%)

Test configuration: 1M requests, 1K concurrent clients, non-cacheable origin

This is a partial implementation covering Cache.cc read paths. Further
optimization possible by converting CacheRead.cc and CacheWrite.cc.

Files modified:
- src/iocore/cache/StripeSM.h: Added dir_mutex member
- src/iocore/cache/P_CacheInternal.h: Added lock wrapper classes
- src/iocore/cache/Cache.cc: Converted 3 critical paths to shared locks

Documentation:
- CACHE_RWLOCK_ANALYSIS.md: Design analysis and implementation strategy
- CACHE_RWLOCK_BENCHMARK_RESULTS.md: Detailed benchmark results and analysis
@bryancall bryancall force-pushed the cache-rwlock-optimization branch from 8b652df to aa99c83 Compare October 22, 2025 23:28
@ezelkow1 ezelkow1 added the Cache label Oct 22, 2025

#### Read-Only Operations (can use shared locks):
```
Cache.cc:345 - stripe->directory.probe() [cache lookup]
Copy link
Contributor

@masaori335 masaori335 Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this approach before and faced a problem. If I understand correctly, Directory::probe needs the write lock when it found invalid Dir and call dir_delete_entry. This is why I'm waiting RCU/Hazard Pointer.

} else { // delete the invalid entry
ts::Metrics::Gauge::decrement(cache_rsb.direntries_used);
ts::Metrics::Gauge::decrement(stripe->cache_vol->vol_rsb.direntries_used);
e = dir_delete_entry(e, p, s, this);
continue;
}

if (!lock.is_locked() || (od = stripe->open_read(key)) || stripe->directory.probe(key, stripe, &result, &last_collision)) {
CACHE_DIR_TRY_LOCK_SHARED(dir_lock, stripe->dir_mutex);
if (!lock.is_locked() || !dir_lock.is_locked() || (od = stripe->open_read(key)) ||
stripe->directory.probe(key, stripe, &result, &last_collision)) {
Copy link
Contributor

@masaori335 masaori335 Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having mutex and dir_mutex might be good approach. When we call Directory::probe with read lock of dir_mutex, but it needs write operation (it's rare case I assume), we can wait for mutex lock. Obviously, we have to be careful for dead lock.

- High-performance BRAVO algorithm implementation
- Optimized fast-path for readers (lock-free in common case)
- Prevents writer starvation with adaptive policy
- More complex but potentially much faster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock contention problem of Stripe Mutex was main motivation of introducing BRAVO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants