Skip to content

Conversation

@BilalG1
Copy link
Contributor

@BilalG1 BilalG1 commented Nov 6, 2025

Summary by CodeRabbit

  • Improvements
    • Optimized country-level metrics calculation with intelligent sampling and automatic scaling to project accurate user counts from sample data.
    • Enhanced data aggregation logic with improved handling of edge cases and invalid entries.

@vercel
Copy link

vercel bot commented Nov 6, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
stack-backend Ready Ready Preview Comment Nov 6, 2025 7:11pm
stack-dashboard Ready Ready Preview Comment Nov 6, 2025 7:11pm
stack-demo Ready Ready Preview Comment Nov 6, 2025 7:11pm
stack-docs Ready Ready Preview Comment Nov 6, 2025 7:11pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 6, 2025

Walkthrough

A sampling cap is introduced for country metric calculations, along with a scaling factor to estimate accurate country user counts when the total exceeds the sample size. Country aggregation logic is reworked to handle null entries and filter invalid results accordingly.

Changes

Cohort / File(s) Change Summary
Country Metrics Sampling
apps/backend/src/app/api/latest/internal/metrics/route.tsx
Adds MAX_USERS_FOR_COUNTRY_SAMPLE cap for user sampling; pre-counts total users and applies scaling factor to project accurate country aggregates; reworks result construction to handle null countryCode entries, compute estimatedCount by scaling raw counts, and filter out invalid rows.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Sampling and scaling logic correctness requires careful verification of the calculation, particularly the scalingFactor derivation and its application
  • Edge case handling for null countryCode entries and filtering logic should be validated
  • Statistical accuracy of the estimation approach warrants review against intended behavior

Possibly related PRs

  • fix metrics queries #880: Modifies the same loadUsersByCountry function and the metrics route file; introduces changes to isAnonymous handling in the same query context.

Poem

🐰 A count capped and scaled with care,
Sampling fairly, statistics fair,
Null entries filtered away,
Metrics brighten up the day!
Rabbits hop through data's maze,
Estimates shine through the haze.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description only contains the template comment referencing CONTRIBUTING.md guidelines with no actual content, context, or explanation of the changes. Add a detailed description explaining what bug was fixed, why the sampling and scaling approach was needed, and any relevant implementation details or testing performed.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix usersByCountry query' is directly related to the main changes, which refactor the usersByCountry metric calculation with sampling and estimation logic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-metrics-route-error

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@BilalG1 BilalG1 requested a review from N2D4 November 6, 2025 19:03
@BilalG1 BilalG1 assigned N2D4 and unassigned BilalG1 Nov 6, 2025
@github-actions github-actions bot assigned BilalG1 and unassigned N2D4 Nov 6, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Optimized loadUsersByCountry to handle large user bases by sampling max 10,000 users and scaling results proportionally.

Key changes:

  • Added MAX_USERS_FOR_COUNTRY_SAMPLE constant (10,000 users)
  • Query now counts total users first, then samples ordered subset
  • Introduced scaling factor calculation to estimate full population metrics from sample
  • Improved null filtering with TypeScript type guards for safer array operations

Impact:
This prevents performance issues when querying country data for projects with large user bases by limiting database queries while maintaining reasonable accuracy through statistical sampling.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - straightforward performance optimization with sound sampling approach
  • The changes implement a well-structured performance optimization using statistical sampling. The logic is sound: limiting queries to 10K users with deterministic ordering and scaling results proportionally. However, confidence is 4/5 rather than 5/5 because sampling introduces statistical estimation which may be less accurate for certain data distributions, particularly when country distribution is non-uniform across user IDs
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
apps/backend/src/app/api/latest/internal/metrics/route.tsx 4/5 Added performance optimization to sample max 10K users for country metrics with scaling estimation, improved null filtering logic

Sequence Diagram

sequenceDiagram
    participant Client
    participant RouteHandler
    participant loadUsersByCountry
    participant Prisma
    participant GlobalPrisma

    Client->>RouteHandler: GET /api/latest/internal/metrics
    RouteHandler->>loadUsersByCountry: Request country data (cached)
    loadUsersByCountry->>Prisma: Count total users
    Prisma-->>loadUsersByCountry: totalUsers count
    loadUsersByCountry->>Prisma: Find sample users (max 10K, ordered)
    Prisma-->>loadUsersByCountry: Sample user IDs
    loadUsersByCountry->>loadUsersByCountry: Calculate scaling factor
    loadUsersByCountry->>GlobalPrisma: Query country data for sample
    GlobalPrisma-->>loadUsersByCountry: Raw country counts
    loadUsersByCountry->>loadUsersByCountry: Apply scaling factor to estimates
    loadUsersByCountry-->>RouteHandler: Scaled country statistics
    RouteHandler-->>Client: Complete metrics response
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
apps/backend/src/app/api/latest/internal/metrics/route.tsx (2)

36-41: Extra query added for scaling calculation.

The totalUsers count query is necessary to calculate the scaling factor. While this adds an extra database round-trip, the performance impact should be minimal, and the approach is clear and maintainable.

If optimization becomes necessary, consider combining the count with the fetch using a window function or CTE, though the current approach prioritizes clarity.


48-49: Document the deterministic sampling trade-off.

The orderBy: { projectUserId: "asc" } approach ensures consistent sampling (beneficial for caching), but introduces bias by always selecting the earliest users. If geographic distribution has shifted over time, this could underestimate or miss countries where users joined more recently.

Consider one of these approaches:

  1. Add documentation explaining the trade-off between cache consistency and sampling accuracy.

  2. Use hash-based sampling for deterministic yet more representative results:

// Example: Sample ~10% deterministically across all user IDs
const samplingPercentage = Math.min(100, Math.ceil((MAX_USERS_FOR_COUNTRY_SAMPLE / totalUsers) * 100));
const users = await prisma.$queryRaw`
  SELECT "projectUserId"
  FROM "ProjectUser"
  WHERE "tenancyId" = ${tenancy.id}::UUID
    AND ${includeAnonymous} OR "isAnonymous" = false
    AND MOD(ABS(HASHTEXT("projectUserId"::text)), 100) < ${samplingPercentage}
  LIMIT ${MAX_USERS_FOR_COUNTRY_SAMPLE}
`;

This maintains determinism while avoiding chronological bias.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c816a0 and f0e886a.

📒 Files selected for processing (1)
  • apps/backend/src/app/api/latest/internal/metrics/route.tsx (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: check_prisma_migrations (22.x)
  • GitHub Check: lint_and_build (latest)
  • GitHub Check: all-good
  • GitHub Check: build (22.x)
  • GitHub Check: build (22.x)
  • GitHub Check: build (22.x)
  • GitHub Check: docker
  • GitHub Check: restart-dev-and-test
  • GitHub Check: setup-tests
  • GitHub Check: restart-dev-and-test-with-custom-base-port
🔇 Additional comments (3)
apps/backend/src/app/api/latest/internal/metrics/route.tsx (3)

17-17: LGTM: Sampling cap defined.

The constant is well-named and the 10,000 sample size is reasonable for preventing performance issues while maintaining adequate statistical representation.


58-58: LGTM: Scaling factor calculation is correct.

The logic correctly scales up country counts when sampling is used, and defaults to 1 (no scaling) when the full dataset is processed.


83-91: LGTM: Result construction correctly handles null values and scaling.

The code properly:

  • Guards against null countryCode (defensive programming despite SQL WHERE clause)
  • Applies the scaling factor with appropriate rounding
  • Uses a type guard to filter null entries

Note on sampling limitations:
Countries with very few users may not appear in the sample at all and will be completely absent from results. This is an inherent limitation of the sampling approach. Consider adding a comment or documentation noting that small countries may be underrepresented or missing when sampling is active.

@BilalG1 BilalG1 merged commit 793de9e into dev Nov 6, 2025
26 checks passed
@BilalG1 BilalG1 deleted the fix-metrics-route-error branch November 6, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants