-
Notifications
You must be signed in to change notification settings - Fork 688
fix: Concurrency control for file locking has multiple anti-concurrent design patterns in the way #8523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR refactors the local file locking mechanism to improve concurrent access patterns by replacing a complex nested locking structure with the scc (scalable concurrent containers) library. The changes span across four files:
Core Infrastructure Changes:
-
src/infra/src/local_lock.rs: Replaces the originalLazy<RwLock<HashMap<String, LockHolder>>>structure withLazy<scc::HashMap<String, ()>>. The oldLockHoldercontainedArc<String>andArc<Mutex<bool>>with explicit lock acquisition methods, while the new implementation returns anOccupiedEntrydirectly fromscc::HashMapthat acts as the lock guard through its lifetime. -
src/infra/Cargo.tomlandCargo.toml: Add thesccdependency to enable the new concurrent data structure.
Service Layer Changes:
src/service/schema.rs: Updates the schema update locking to use the simplified API, changing from a two-step process (let local_lock = infra::local_lock::lock().await?; let _guard = local_lock.lock().await) to a single call (let _guard = infra::local_lock::lock().await).
Design Rationale:
The original implementation had what the developer describes as "anti-concurrent design patterns" with multiple layers of synchronization: an outer RwLock protecting a HashMap containing LockHolder structs, each with their own Arc<Mutex<bool>>. This created unnecessary contention and complexity. The new approach leverages scc::HashMap's lock-free concurrent design, where the returned OccupiedEntry provides the locking semantics through RAII (Resource Acquisition Is Initialization) - the lock is held for the lifetime of the entry and automatically released when dropped.
This change fits into the broader codebase by simplifying the fundamental locking primitive used throughout OpenObserve for controlling file access across various services. The infra::local_lock module is used extensively for ensuring thread-safe operations on shared resources, particularly in data ingestion and schema management workflows.
Confidence score: 2/5
- This PR has significant implementation issues that could cause production problems
- Score reflects fundamental flaws in the locking semantics and lifetime management of the new approach
- Critical attention needed on local_lock.rs implementation and all files using the changed API
4 files reviewed, 1 comment
| }; | ||
| drop(w); | ||
| Ok(locker) | ||
| pub async fn lock(key: &str) -> OccupiedEntry<'_, String, ()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Return type changed from Result<LockHolder> to OccupiedEntry - this is a breaking API change that will require all callers to be updated
PR Code Suggestions ✨Explore these optional code suggestions:
|
76c65ca to
ec90dc0
Compare
f20e3df to
3980b4c
Compare
…can be easily missed
da41592 to
712bc71
Compare
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 376 | 351 | 0 | 21 | 4 | 93% | 4m 30s |
User description
PR Type
Enhancement / Fix
Description
This PR is the beginning of the many changes that will improve the concurrent access to the services and locks within the system. This PR addresses improvements to the infra::local_lock which offers a global lock to control access to a single file. The old design had Lazy<RwLock<std::collections::HashMap<String, (Arc, Mutex)>> > which can be replaced with a Lazy<scc::HashMap<String, ()>> which does an equivalent job.
PR Type
Enhancement, Bug fix
Description
Replace global lock map with scc::HashMap
Simplify LockHolder and locking API
Update tests to new locking semantics
Add scc dependency in workspace
Diagram Walkthrough
File Walkthrough
local_lock.rs
Switch to scc map and simplify lockingsrc/infra/src/local_lock.rs
schema.rs
Adapt schema handling to new local locksrc/service/schema.rs
Cargo.toml
Add scc dependency to root manifestCargo.toml
Cargo.toml
Include scc in infra crate dependenciessrc/infra/Cargo.toml