Code Review Bench PR #14587 - Refactor some of ASM and slot-stats functions#4
Code Review Bench PR #14587 - Refactor some of ASM and slot-stats functions#4ketkarameya wants to merge 3 commits intobase_pr_14587_20260125_4187from
Conversation
Code Review 🚫 Blocked 0 resolved / 2 findingsCritical data loss bug: the two-pass split of slot verification deletes keys in unassigned slots before claiming them, silently destroying data that the old single-pass code preserved. Also, 🚨 Bug: Two-pass refactor causes data loss for keys in unassigned slotsThe original Suggested fix
|
| Auto-apply | Compact |
|
|
Was this helpful? React with 👍 / 👎 | Gitar
| /* Delete keys in unowned slots */ | ||
| clusterDeleteKeysInUnownedSlots(); | ||
| /* Take over slots that we have keys for, but are assigned to no one. */ | ||
| clusterClaimUnassignedSlots(); |
There was a problem hiding this comment.
🚨 Bug: Two-pass refactor causes data loss for keys in unassigned slots
The original verifyClusterConfigWithData() used a single pass through all slots. For each slot with keys that wasn't owned by this node and wasn't being imported, it checked:\n- If unassigned (NULL owner) → claim the slot (no data loss)\n- If assigned to another node → delete the keys\n\nThe refactored code splits this into two sequential passes:\n1. clusterDeleteKeysInUnownedSlots() — deletes keys in slots we don't own\n2. clusterClaimUnassignedSlots() — claims unassigned slots that have keys\n\nThe problem: clusterIsMySlot(i) returns false for unassigned slots (since getNodeBySlot() returns NULL ≠ myself). So in step 1, keys in unassigned slots are deleted. Then in step 2, countKeysInSlot(i) returns 0 for those same slots (keys were just deleted), so they are skipped and never claimed.\n\nResult: keys in unassigned slots are silently destroyed instead of being preserved by claiming the slot. This is a data loss regression.\n\nFix: Reverse the call order — call clusterClaimUnassignedSlots() before clusterDeleteKeysInUnownedSlots(), so unassigned slots are claimed first (preserving keys), and only then are keys in slots owned by other nodes deleted.
Was this helpful? React with 👍 / 👎
| /* Delete keys in unowned slots */ | |
| clusterDeleteKeysInUnownedSlots(); | |
| /* Take over slots that we have keys for, but are assigned to no one. */ | |
| clusterClaimUnassignedSlots(); | |
| /* Take over slots that we have keys for, but are assigned to no one. */ | |
| clusterClaimUnassignedSlots(); | |
| /* Delete keys in unowned slots */ | |
| clusterDeleteKeysInUnownedSlots(); |
- Apply suggested fix
| /* This function is called at server startup in order to initialize cluster data | ||
| * structures that are shared between the different cluster implementations. */ | ||
| void clusterCommonInit(void) { | ||
| server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat)); |
There was a problem hiding this comment.
⚠️ Bug: Use of raw malloc instead of zmalloc breaks memory tracking
The original code in server.c:main() used zmalloc() to allocate cluster_slot_stats. The new clusterCommonInit() uses raw malloc() instead. This means:\n\n1. The allocation is invisible to Redis's INFO memory reporting and used_memory tracking\n2. Memory fragmentation metrics will be inaccurate\n3. If Redis is configured with maxmemory, this allocation won't count toward the limit\n4. It's inconsistent with every other allocation in cluster.c (all use zmalloc)\n\nThe PR context notes mention "Use malloc instead of zmalloc for allocation (consistent with pattern)" but this appears incorrect — no other allocation in cluster.c uses raw malloc.\n\nFix: Change malloc to zmalloc to match the original code and Redis conventions.
Was this helpful? React with 👍 / 👎
| server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat)); | |
| server.cluster_slot_stats = zmalloc(CLUSTER_SLOTS*sizeof(clusterSlotStat)); |
- Apply suggested fix
Benchmark PR for Gitar evaluation
Original PR: agentic-review-benchmarks#4
Refactor some of ASM and slot-stats functions
Summary by Gitar
clusterCommonInit()function consolidates slot statistics and ASM initialization, called beforeclusterInit()to prepare shared data structures for multiple cluster implementationsverifyClusterConfigWithData()intoclusterDeleteKeysInUnownedSlots()andclusterClaimUnassignedSlots(), using abstraction APIs instead of direct struct access for portabilityasmCron()andasmBeforeSleep()fromcluster_legacy.ctoserver.c, enabling Atomic Slot Migration support across all cluster implementationsclusterCommandSpecial()to genericclusterCommand()dispatcher