Skip to content

Code Review Bench PR #14587 - Refactor some of ASM and slot-stats functions#4

Open
ketkarameya wants to merge 3 commits intobase_pr_14587_20260125_4187from
corrupted_pr_14587_20260125_4187
Open

Code Review Bench PR #14587 - Refactor some of ASM and slot-stats functions#4
ketkarameya wants to merge 3 commits intobase_pr_14587_20260125_4187from
corrupted_pr_14587_20260125_4187

Conversation

@ketkarameya
Copy link

@ketkarameya ketkarameya commented Feb 10, 2026

Benchmark PR for Gitar evaluation

Original PR: agentic-review-benchmarks#4

Refactor some of ASM and slot-stats functions


Summary by Gitar

  • Extracts common cluster initialization: New clusterCommonInit() function consolidates slot statistics and ASM initialization, called before clusterInit() to prepare shared data structures for multiple cluster implementations
  • Refactors slot verification into reusable utilities: Splits verifyClusterConfigWithData() into clusterDeleteKeysInUnownedSlots() and clusterClaimUnassignedSlots(), using abstraction APIs instead of direct struct access for portability
  • Moves ASM lifecycle to generic event loop: Relocates asmCron() and asmBeforeSleep() from cluster_legacy.c to server.c, enabling Atomic Slot Migration support across all cluster implementations
  • Reorganizes command routing: Migrates "migration" and "syncslots" handlers from legacy-specific clusterCommandSpecial() to generic clusterCommand() dispatcher

@gitar-bot-staging
Copy link

gitar-bot-staging bot commented Feb 10, 2026

Code Review 🚫 Blocked 0 resolved / 2 findings

Critical data loss bug: the two-pass split of slot verification deletes keys in unassigned slots before claiming them, silently destroying data that the old single-pass code preserved. Also, malloc should be zmalloc to maintain Redis memory tracking.

🚨 Bug: Two-pass refactor causes data loss for keys in unassigned slots

📄 src/cluster.c:2209-2212

The original verifyClusterConfigWithData() used a single pass through all slots. For each slot with keys that wasn't owned by this node and wasn't being imported, it checked:\n- If unassigned (NULL owner) → claim the slot (no data loss)\n- If assigned to another node → delete the keys\n\nThe refactored code splits this into two sequential passes:\n1. clusterDeleteKeysInUnownedSlots() — deletes keys in slots we don't own\n2. clusterClaimUnassignedSlots() — claims unassigned slots that have keys\n\nThe problem: clusterIsMySlot(i) returns false for unassigned slots (since getNodeBySlot() returns NULL ≠ myself). So in step 1, keys in unassigned slots are deleted. Then in step 2, countKeysInSlot(i) returns 0 for those same slots (keys were just deleted), so they are skipped and never claimed.\n\nResult: keys in unassigned slots are silently destroyed instead of being preserved by claiming the slot. This is a data loss regression.\n\nFix: Reverse the call order — call clusterClaimUnassignedSlots() before clusterDeleteKeysInUnownedSlots(), so unassigned slots are claimed first (preserving keys), and only then are keys in slots owned by other nodes deleted.

Suggested fix
    /* Take over slots that we have keys for, but are assigned to no one. */
    clusterClaimUnassignedSlots();
    /* Delete keys in unowned slots */
    clusterDeleteKeysInUnownedSlots();
⚠️ Bug: Use of raw malloc instead of zmalloc breaks memory tracking

📄 src/cluster.c:2138

The original code in server.c:main() used zmalloc() to allocate cluster_slot_stats. The new clusterCommonInit() uses raw malloc() instead. This means:\n\n1. The allocation is invisible to Redis's INFO memory reporting and used_memory tracking\n2. Memory fragmentation metrics will be inaccurate\n3. If Redis is configured with maxmemory, this allocation won't count toward the limit\n4. It's inconsistent with every other allocation in cluster.c (all use zmalloc)\n\nThe PR context notes mention "Use malloc instead of zmalloc for allocation (consistent with pattern)" but this appears incorrect — no other allocation in cluster.c uses raw malloc.\n\nFix: Change malloc to zmalloc to match the original code and Redis conventions.

Suggested fix
    server.cluster_slot_stats = zmalloc(CLUSTER_SLOTS*sizeof(clusterSlotStat));
Rules 🎸 1 action taken

Gitar Rules

🎸 Summary Enhancement: Appended 'Summary by Gitar' section with 4 technical bullet points describing cluster initialization refactoring

5 rules not applicable. Show all rules by commenting gitar display:verbose.

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Comment on lines +2209 to +2212
/* Delete keys in unowned slots */
clusterDeleteKeysInUnownedSlots();
/* Take over slots that we have keys for, but are assigned to no one. */
clusterClaimUnassignedSlots();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Bug: Two-pass refactor causes data loss for keys in unassigned slots

The original verifyClusterConfigWithData() used a single pass through all slots. For each slot with keys that wasn't owned by this node and wasn't being imported, it checked:\n- If unassigned (NULL owner) → claim the slot (no data loss)\n- If assigned to another node → delete the keys\n\nThe refactored code splits this into two sequential passes:\n1. clusterDeleteKeysInUnownedSlots() — deletes keys in slots we don't own\n2. clusterClaimUnassignedSlots() — claims unassigned slots that have keys\n\nThe problem: clusterIsMySlot(i) returns false for unassigned slots (since getNodeBySlot() returns NULL ≠ myself). So in step 1, keys in unassigned slots are deleted. Then in step 2, countKeysInSlot(i) returns 0 for those same slots (keys were just deleted), so they are skipped and never claimed.\n\nResult: keys in unassigned slots are silently destroyed instead of being preserved by claiming the slot. This is a data loss regression.\n\nFix: Reverse the call order — call clusterClaimUnassignedSlots() before clusterDeleteKeysInUnownedSlots(), so unassigned slots are claimed first (preserving keys), and only then are keys in slots owned by other nodes deleted.

Was this helpful? React with 👍 / 👎

Suggested change
/* Delete keys in unowned slots */
clusterDeleteKeysInUnownedSlots();
/* Take over slots that we have keys for, but are assigned to no one. */
clusterClaimUnassignedSlots();
/* Take over slots that we have keys for, but are assigned to no one. */
clusterClaimUnassignedSlots();
/* Delete keys in unowned slots */
clusterDeleteKeysInUnownedSlots();
  • Apply suggested fix

/* This function is called at server startup in order to initialize cluster data
* structures that are shared between the different cluster implementations. */
void clusterCommonInit(void) {
server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Use of raw malloc instead of zmalloc breaks memory tracking

The original code in server.c:main() used zmalloc() to allocate cluster_slot_stats. The new clusterCommonInit() uses raw malloc() instead. This means:\n\n1. The allocation is invisible to Redis's INFO memory reporting and used_memory tracking\n2. Memory fragmentation metrics will be inaccurate\n3. If Redis is configured with maxmemory, this allocation won't count toward the limit\n4. It's inconsistent with every other allocation in cluster.c (all use zmalloc)\n\nThe PR context notes mention "Use malloc instead of zmalloc for allocation (consistent with pattern)" but this appears incorrect — no other allocation in cluster.c uses raw malloc.\n\nFix: Change malloc to zmalloc to match the original code and Redis conventions.

Was this helpful? React with 👍 / 👎

Suggested change
server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat));
server.cluster_slot_stats = zmalloc(CLUSTER_SLOTS*sizeof(clusterSlotStat));
  • Apply suggested fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants